A hybrid model of kernel density estimation and quantile regression for GEFCom2014 probabilistic load forecasting

A h ybrid mo del of k ernel densit y estimation and quan tile regression for GEF Com2014 probabilistic load forecasting Stephen Hab en ∗ 1 and Georgios Giasemidis † 2 1 Mathematical Institute, Univ ersity of Oxford, O X2 6GG, UK 2 Coun tingLab Ltd, Reading, RG2 8EF, UK Abstract W e presen t a mo del for generating probabilistic forecasts by com bining k ernel density esti- mation (KDE) and quantile regression tec hniques, as part of the probabilistic load forecasting trac k of the Global Energy F orecasting Competition 2014. The KDE metho d is initially im- plemen ted with a time-decay parameter. W e later improv e this metho d by conditioning on the temp erature or the p eriod of the week v ariables to pro vide more accurate forecasts. Sec- ondly , we dev elop a simple but eﬀective quantile regression forecast. The nov el asp ects of our metho dology are tw o-fold. First, we in tro duce symmetry into the time-decay parameter of the k ernel densit y estimation based forecast. Secondly we combine three probabilistic forecasts with diﬀeren t weigh ts for diﬀerent perio ds of the month. 1 In tro duction In this paper w e presen t our metho dology used in a winning en try for the probabilistic load fore- casting trac k of the Global Energy F orecasting Competition 2014 (GEFCom2014). The comp etition consisted of tw elv e weekly tasks whic h require using historical data for the estimation of 99 quan- tiles ( 0 . 01 , 0 . 02 , ..., 0 . 99 ) for eac h hour of the following month. Eac h forecast is ev aluated using the pin ball function. F or further details on the comp etition structure and the data the interested reader should refer to the GEFCom2014 introduction pap er [4]. In Section 2 we presen t a preliminary analysis of the data that motiv ates the dev elopment of the main forecasting metho ds introduced in Section 3. In Section 4 we give a short description of our submissions in c hronological order to explain the reasoning b ehind the chosen forecasts and the dev elopmen ts of the subsequent forecasts. W e presen t an o verall view of the results and conclude in Section 5 with a discussion, lessons learned and future work. 2 Preliminary Analysis W e start b y p erforming a preliminary analysis to determine our initial forecast metho ds. W e ﬁrst tested the comp etition’s initial historical data set to conﬁrm that load and temp erature are strongly correlated, as shown in other studies [2], see also the GEF Com2014 introduction pap er [4] for the ∗ Stephen.Hab en@maths.o x.ac.uk † G.Giasemidis@reading.ac.uk 1 time-series plots of the data. This motiv ates the developmen t of our k ernel densit y estimation metho d conditional on the temperature (see Section 3.3). W e also found that all the w eather stations were strongly correlated with each other and the load data. Hence as an initial estimate of the temp erature we simply to ok an a verage ov er all 25 stations. The load data has strong daily , weekly and y early seasonalities as w ell as trends [4]. A visual analysis of the load data sho wed that certain hours of the da y exhibited strong bi-annual seasonalities (suc h as 11 pm) whereas others did not (e.g. 3 pm). This could b e due to heating and co oling appliances b eing employ ed through the seasons. This inspires our choice of biannual mo del in the quan tile regression based forecast (see Section 3.4). Consideration of the autocorrelation and partial auto correlation plots conﬁrmed the presence of the weekly and daily perio dicities. Our forecasts describ ed in the follo wing section are inﬂuenced with this p erio dicit y in mind. 3 Metho do dology In this section w e present the main metho ds implemented for the comp etitiv e tasks of the comp e- tition. 3.1 Kernel Density Estimation (KDE) Man y of the metho ds w e emplo y are non-parametric kernel density based estimates and similar to those as presen ted in [5] for probabilistic wind forecasting and [1] for household-level probabilistic load forecasting. This metho d is motiv ated by the strong weekly correlations in the data. A simple k ernel densit y estimate produces an estimate of the probabilit y distribution function f ( X ) of the load X (at a particular future time p erio d) using past hourly observ ations { X i } (assuming i = 1 is b eginning of historical load data: 1 st Jan 2005.). It is giv en by f ( X ) = 1 nh x n X i =1 K  X − X i h x  , (3.1) where h x is the load bandwidth. W e use a Gaussian k ernel function, K ( • ) , for all our kernel based forecast metho ds. Our ﬁrst metho d is a KDE with a time decay parameter, 0 < λ ≤ 1 . The role of the decay parameter is to giv e higher weigh t to more recent observ ations. T o forecast day D of the w eek, D = 1 , 2 , . . . , 7 , at hour h , h = 1 , 2 , . . . , 24 , we applied a KDE on all historical observ ations of the same da y D and hour h . This method only considers observ ations b elonging to the same hourly p eriod of the week, denoted b y w , w = 1 , . . . , 168 , and we refer to it as KDE-W . This can b e expressed as f ( X ) = 1 nh x n X i =1 { i mod s = w } λ α ( i ) P n i =1 { i mod s = w } λ α ( i ) K  X − X i h x  . (3.2) The parameter s = 168 is the n umber of forecasting hours in a week and α ( i ) is a p eriodic function giv en by 1 α ( i ) = min ( |D − ( D ( i ) − 1 A ( i )) | , T ( i ) − |D − D ( i ) | ) , (3.3) 1 The careful reader should note that the form ula (3.3) might need a further correction by one when D is in a leap y ear. Ho wev er this do es not aﬀect our results, since w e did not forecast leap y ears. A dditionally such an error would ha ve a negligible eﬀect in the weigh t. 2 where D ( i ) = 1 , 2 , . . . , T ( i ) is the day of the year (consisting of T ( i ) days) corresponding to the historical data X i and D is the da y of the y ear corresp onding to the forecasted da y . T o correct for leap y ears w e use an indicator function 1 A ( i ) where A = { i |D ( i ) > 28 and T ( i ) = 366 } . Expression (3.3) is simply a p erio dic absolute v alue function with annual perio d, whose minimum v alues o ccur ann ually on the same dates as the forecasted da y . This method is similar to the one presented in [1]. Ho wev er the new feature is the half-y early symmetry of the time-decay exp onential (3.3). Since there is an ann ual p erio dicit y in the load we incorp orated it into the time-deca y parameter such that observ ations during similar days of the year inﬂuence the forecast more than other, less relev ant observ ations. The decay parameter also helps us to take in to account the non-stationary b eha viour of demand. This metho d p erformed b etter compared to a similar KDE-W using only a simple monotonically decreasing time-decay parameter across the year. The mo del parameters w ere generated using cross-v alidation on the mon th prior to the forecasting mon th. T o ﬁnd the optimal bandwidth, h x , w e used the fminbnd function from the optimisation to olb o x in Matlab. F or the time-decay parameter λ w e considered diﬀerent v alues b et w een 0 . 92 and 1 with 0 . 01 increments 2 . The k ernel density based estimate has b een used as a b enc hmark in probabilistic forecast metho ds applied to household level electricity demand. It serves as a useful starting p oin t for our forecasts [1]. The metho d has the adv antage of b eing quick er to implement than more complicated k ernel based methods, such as the conditional kernel density estimate on indep enden t parameters, whic h w e introduce in the next sections. 3.2 Conditional Kernel Densit y Estimate on Period of W eek (CKD-W) A KDE forecast conditional on the p eriod of the week, denoted by w , w = 1 , . . . , 168 , (CKD-W) [1] giv es a higher weigh t to observ ations from similar hourly p erio ds of the w eek and can b e represen ted as f ( X | w ) = n X i =1 λ α ( i ) K (( w i − w ) /h w ) P n i =1 λ α ( i ) K (( w i − w ) /h w ) K  X − X i h x  (3.4) where α ( i ) is deﬁned in (3.3). This method is similar to the one presented in [1]. Ho wev er the new feature is the half-y early symmetric time-deca y exp onen tial (3.3) which is justiﬁed by the y early p erio dicit y of the load as explained in the previous section. The v alidation pro cess can be computationally v ery exp ensiv e, especially while searc hing for m ultiple optimised parameters (here there are three parameters, the bandwidths for load and w eek p eriod v ariables, and the time decay). In particular, despite using the Matlab parallelisation to obox, executing this metho d on our (con ven tional) mac hines 3 , required more than a day to complete, whic h is not practical given the w eekly constraints of the comp etition. In an attempt to reduce the computational cost, w e reduced the num ber of historical observ ation and the length of the v alidation p erio d. W e only used observ ations starting from January of 2008 and w e cross-v alidated our parameters using only one w eek from the v alidation month 4 . 2 The time-decay parameter must b e in the interv al (0 , 1] , the smaller the v alue the fewer historical observ ations whic h hav e signiﬁcan t inﬂuence on the ﬁnal forecast. After testing o ver sev eral tasks w e found that the deca y parameter is bounded b elo w by 0 . 92 . 3 All forecasts were executed on a machine with In tel Core i7-361QM Quad-Core Pro cessor @ 2.30GHz and 16GB of memory . 4 Initially we used the ﬁrst week, but later w e used the last w eek from the v alidation month b ecause it is closer to the p eriod to b e forecasted. Ho wev er the improv emen t w as minor. 3 F or the optimisation of the bandwidths we used the fminse ar ch function (implemen ting a log transformation to ensure that w e only mo del for p ositiv e v alues) from the optimisation to olbox in Matlab. F or the time-decay parameter w e lo oped ov er diﬀerent v alues of λ b et ween 0.92 and 1 with 0.01 increments. A t the ﬁnal stages of the comp etition we used the fminse ar chbnd function 5 for parameter optimisation, which impro ves b oth the computational time and the accuracy . W e call this implement ation of the metho d CKD-W2, see also Section 4. 3.3 Conditional Kernel Densit y Estimate on T emp erature F orecast (CKD-T) W eather information is particularly useful for an accurate load forecast (among many references in the literature see [5] in the con text of CKD metho ds, and also a winning entry of GEFCom2012 [2]). F or this reason w e implemented a KDE metho d conditional on the temp erature (CKD-T). W e tak e the explanatory v ariable to b e the mean hourly temp erature T from the 25 w eather substations. The conditional probability density is given b y f ( X | T ) = X i ∈A K (( T i − T ) /h T ) P i ∈A K (( T i − T ) /h T ) K  X − X i h x  , (3.5) where h T is the bandwidth of the temperature k ernel and T i is the temperature corresponding to the same hour h and da y d as the load X i . The index subset A consists of indices at time h and da ys d − 5 , . . . , d, . . . , d + 5 of all previous years. The form ula (3.5) do es not include a time-decay parameter since we assume the temp erature is the main driv er of seasonalit y . Th us we do not include a decay parameter which w ould increase the computational exp ense for very little gain. F or parameter optimisation we used the fminse ar ch function, implementing a log transformation as with the CKD-W forecast. Since temp erature forecasts are inaccurate b ey ond a few da ys this metho d was only implemented for the ﬁrst da y of a task. As we will shortly describ e in Section 3.5, the remaining da ys of a task are forecasted using a w eighted com bination of CKD-W and a quantile forecast, in tro duced in Section 3.4. 3.3.1 T emperature F orecast The CKD-T metho d requires a forecast of the mean temp erature in order to create a load forecast. W e follo w a simple autoregression forecast metho d, similar to that presented in [7]. The mo del w as c hosen for its simplicit y . In addition, temp erature can change rapidly within a couple of da ys, and without more data (suc h as wind sp eeds and direction), and the access to complicated numerical w eather prediction softw are we decided a simple mo del is appropriate for our uses. The model consists of a trend, seasonalities (both diurnal and y early) and lagged temp erature v ariables. W e mo del the temp erature T j at timestep j as T j = β 0 + β 1 j + S d j + S a j + 25 X k =1 α k T j − k . (3.6) The diurnal seasonal terms are described b y S d j = P X p =1  γ p sin  2 π p d ( j ) 24  + δ p cos  2 π p d ( j ) 24  , (3.7) 5 http://www.mathworks.com/matlabcentral/fileexchange/8277- fminsearchbnd- - fminsearchcon . 4 where γ p , δ p are F ourier co eﬃcien ts (with P = 4 ) and d ( j ) = j mo d 24 is the con version to the hour of the day . The yearly seasonal terms are mo delled by S a j = M X m =1 ψ m sin  2 π m ( f ( j ) + φ ) 365  , (3.8) where ψ m , m = 1 , 2 , ..., M and M = 3 , are the coeﬃcients and f ( j ) = j / 24 . The metho d slightly diﬀers from that in [7] which uses f ( j ) = b j / 24 c (the da y of the data). The shift φ ensures the p eriod terms matc h the perio d of the data as optimally as possible. The v alue φ = − 85 was c hosen such that the mean absolute p ercentage error (MAPE) is minimised. W e set j = 0 for the start of data at midnight on 1 st Jan uary 2005 . The ﬁnal terms of equation (3.6) are the lags. By consideration of the auto correlation, we chec k ed the p oten tial num b er of lag terms to use and found that the previous 25 hours gav e the minimum MAPE for day ahead and month ahead temp erature forecasts ov er Nov em b er 2009 (a preliminary task). The v alues of M , P and φ w ere all c hosen b y cross v alidation o ver the mon th of No vem ber 2009 . The co eﬃcients β 0 , β 1 , γ p , δ p and ψ were found via the linear regression function in Matlab, r e gr ess . W e attempted to select the most represen tativ e and accurate w eather stations to improv e the da y ahead CKD-T forecast. W e chose groups of three and six w eather stations which ga ve the b est MAPE for a day ahead temp erature forecast. Using the a verage temp erature from these stations in (3.6) did not provide a consistent impro vemen t in the pinball scores. Hence we only used the mean o ver all weather stations for the CKD-T day ahead forecasts. 3.4 Quan tile Regression (QR) The quan tile regression is a generalisation of standard regression where eac h quan tile is found through the minimisation of a linear model to historical observ ations according to a loss function [6]. Supp ose w e ha ve a mo del of the demand, at time t = 1 , ..., n giv en b y f ( U t , β ) , where U t are the indep enden t v ariables and β are the unknown mo del parameters. Also suppose w e ha v e observ ations of the load y t at the same times t = 1 , ..., n . Then for a giv en quantile q the aim is to ﬁnd the parameters β q giv en by β q = argmin β n X t =1 ρ q ( y t − f ( U t , β )) , (3.9) where ρ ( • ) is the loss function giv en by ρ q ( z ) = | z ( q − 1 ( z < 0) ) | , (3.10) where 1 ( z < 0) is the indicator function. W e created a simple linear function, for eac h hour of the da y separately , based on only trend and seasonal terms. F or each daily hour on da y k (with k = 1 meaning 1 st Jan 2005 ) of the data set, w e deﬁne our mo del by L k = a 0 + a 1 k + 2 X p =1 b p sin  2 π p ( k + φ 1 ) 365  + 2 X m =1 c m sin  2 π m ( k + φ 2 ) 365  . (3.11) The ﬁrst shift term is chosen φ 1 = − 111 , b y minimising the MAPE, and the second shift is φ 2 = φ 1 − 364 / 2 . The double seasonalit y oﬀset term w as used because of the double y early perio d disco vered in the load for some hours of the da y . The co eﬃcien ts a 0 , a 1 , b 1 , b 2 , c 1 , c 2 are found for 5 eac h quan tile forecasted via a simple linear programming metho d. W e implemen ted this using optimset function in Matlab utilising the Simplex algorithm option. T o reduce computational cost w e only used 500 days of history to ﬁnd the parameters. Once the quan tile forecasts w ere found w e resorted them to ensure there w as no crossing of the quantiles [3]. 3.5 Mixed F orecasts and Hybrid F orecasts Eac h of the main forecasts presented had diﬀerent p erformance for diﬀerent forecast horizons. F or this reason we created new forecasts whic h were mixes of our main metho ds based on their p erfor- mances ov er diﬀerent horizons. W e consider tw o main metho ds • Mix 1: This is simply the CKD-W forecast but using the CKD-T forecast for the ﬁrst da y . • Mix 2: As mix 1 but using the QR forecast from the start of the 8 th da y until the end of the mon th. With the success of the mixed forecasts (see Section 4) w e also explored combinations of the forecasts. This has b een shown to impro ve the o v erall forecast accuracy compared to individual forecast metho ds [8]. W e split the forecast into ﬁve diﬀerent time p erio ds. P erio d one w as simply the ﬁrst day , perio d t wo the rest of the ﬁrst week, perio d three the second week, perio d four the third w eek and p erio d ﬁve the rest of the mon th. F or the ﬁrst p eriod we simply used the CKD-T whic h had the b est da y ahead accuracy of all the forecasts. F or each of the other p erio ds we to ok a w eigh ted a verage of the quantiles time series in the quan tile regression forecast, F QR , and the CKD-W forecast, F CKD-W , F Hybrid ( τ ) = w ( τ ) F CKD-W ( τ ) + (1 − w ( τ )) F QR ( τ ) , (3.12) where τ = 2 , 3 , 4 , 5 is the time p erio d and 0 ≤ w ( τ ) ≤ 1 is the a verage optimal weigh t at time p eriod τ . The optimal w eight of eac h past task is found by searc hing diﬀeren t weigh ted combinations of the CKD-W with the quantile regression forecasts for eac h time-p erio d τ > 1 that minimise the pin ball scores. W e repeat this pro cess for a num b er of past tasks and then take the av erage optimal w eight for each time p eriod. W e call this forecast the hybrid for e c ast . 4 T ask Submissions and Results W e rank ed our forecasts using the scores from prior tasks. W e used this to understand whic h metho ds to p ersist with and whic h ones to reject or adapt. In this section we describe our selection pro cedure for each task in chronological order to justify our methodology and approac h. Figure 1 sho ws graphically the scores for our b est submissions, the b enc hmark and the top scoring forecast for each task 6 . The plot shows our forecasts performing consistently w ell in all tasks other than task four and eight as we will explain b elo w. W e note that the leader is not the same entran t for eac h task. The b enc hmark is simply the previous y ear’s load used for all quantiles. In tasks 4 and 5 we implemen ted the KDE-W metho d (see Section 3.1). December 2010 (task 3) app eared to ha ve un usually low temp eratures and since this month w as also used for parameters training it could explain the high scores of most entran ts of task 4. W e note that the simple quan tile regression forecast (introduced in task 9) p erforms very well for this metho d, scoring 10.36, in fact 6 T asks 1 to 3 were trial tasks. W e fo cused on searc hing for patterns, trends and correlations in the load and temp erature data and dev eloping our more sophisticated metho ds. W e submitted simple parametric mo dels and the KDE-W metho d. 6 4 6 8 10 12 14 0 5 10 15 20 25 30 35 Task number Pinball score Benchmark Final Leader Submitted Forecast Figure 1: Pinball scores of our submitted forecast, the b enchmark and the ﬁnal leader. b eating the top en try score. This could be due to b eing less inﬂuenced by the previous, exceptional, mon th. F or tasks 6 and 7 , we dev elop ed the CKD-W metho d to take into accoun t weekly eﬀects. This w as found to perform b etter than the KDE-W metho d. W e also submitted a CKD-W for task 8 but trained the parameters on the same mon th of the previous y ear, rather than the previous mon th of the same y ear. The data from the previous year would b e less recent but lik ely related to the current month’s b eha viour due to annual p erio dicit y of the load. In addition, data from the previous month had little inﬂuence on forecasts of b ey ond a w eek so it made sense to attempt to optimize parameters on data av ailable for the entire p erio d. Although this metho d p erformed better than CKD-W for task 7 the metho d did not p erform as well as expected for the task 8 submission and was abandoned from thereon. W e found that the CKD-T method, although p oor for forecasting the en tire mon th w as the most successful method for forecasting a da y ahead (see Section 3.3). In addition w e dev elop ed the QR forecast which was p erforming w ell, esp ecially at horizons of o ver a w eek ahead. Hence, for tasks 9, 10 and 11 we implemen ted our mixed forecasts. Mo difying the ﬁrst da y forecast with the CKD-T forecast, to create mix forecast 1, ga ve us impro ved forecast for task 9. F urther improv emen ts came with mix forecast 2 whic h w as used as our submission for task 10 and 11 (giving us second place in both leaderb oards). F urther testing of the forecasts on older tasks indeed conﬁrmed the impro vemen t of the methods. Up to this point the mix 2 forecast ga ve the more consistent b est scores for tasks 2 through to 8 with the smallest a verage pin ball score of 8 . 61 compared to the next b est of the quantile forecast with the CKD-T for the ﬁrst da y of 8 . 63 (the b enchmark a verage w as 15 . 28 ). This seems to indicate that a ma jor contribution to the improv emen t came from the quan tile regression forecast. F or tasks 12 to 15 we implemented the h ybrid forecast. F or these tasks we trained the weigh ts using tasks 6 to 11, 2 to 12, 2 to 13 and 2 to 14 resp ectiv ely . This forecast performed b etter for eac h task compared to our other metho ds, see also T able 1. F or task 15 , w e initially attempted to mo del separately the sp ecial da ys, Christmas Ev e, Christmas Day and New Y ear’s day . How ev er w e sa w no improv emen t in our forecasts and since these days all o ccurred on week ends for task 15, w e abandoned this idea. The hybrid mo dels were consistently the b est mo del for tasks 12 to 14 with an av erage pinball score of 5 . 36 compared to the next b est score of 5 . 41 for the CKD-W2 method. 7 F orecast LS WT A ctual Hybrid QR CKD-W2 Mix 2 Mix 1 CKD-W KDE-W Score 54.2 50.8 48.5 51.4 48.7 48.7 48.4 47.5 47.4 44.6 T able 1: W eigh ted av erage scores of the leading team of each task (“LS”), the comp etition’s winning team (“WT”) and our v alid submitted forecasts (“A ctual”) and our main metho ds describ ed in Section 3. Ho wev er for task 15 the metho d did not perform to o w ell, with a pinball score of 9 . 55 compared to only 7 . 844 for the KDE-W and 8 . 099 for the CKD-T (the winning score w as 8 . 229 ). In fact the CKD-T metho d p erformed surprisingly well for tasks 12 through 14 with an av erage score of 5 . 42 meaning a b etter score on av erage than the hybrid forecast for tasks 12 through 15 ( 6 . 089 ). This is particularly surprising giv en the CKD-T metho d had the w orst p erformance of all metho ds prior to task 12. This could p ossibly b e the result of relativ ely stable temp eratures for these months. The ﬁnal scores were calculated as a w eighted av erage of the p ercen tage improv emen t relativ e to the b enc hmark for eac h task. Each percentage score was giv en a weigh t which increased linearly from the fourth to last task. The scores for selected metho ds (plus, for comparison the scores of the leading submission for each task 7 and the comp etition’s winning team) are shown in T able 1. The larger the score the better the forecast. The hybrid forecast uses the weigh ts used in the ﬁnal task and therefore is not a completely accurate representation of the actual h ybrid forecast score since the data w as optimized to the same tasks. How ev er it do es sho w the p oten tial of the metho d. If we had more time then p otentially we could train the w eights on a larger sample for each time p erio d b y a rolling window rather than, sometimes, less than six tasks. The table sho ws the improv emen ts made with subsequent tasks. W e note, despite the simplicity of the metho d, the QR forecast is one of the b est non-h ybrid forecast on av erage. Ho wev er on a few tasks this forecast did not p erform as w ell as the CKD-W and CKD-W2 forecasts (tasks 5, 6, 11, 14) and th us a mixed forecast is p erhaps a more reliable c hoice since these metho ds p erform well when QR do es and reduce the errors when QR do es not p erform as well. The b etter score of CKD-W2 ov er CKD-W shows the imp ortance of using the b est optimization programs for the forecast. 5 Summary and Discussion W e hav e describ ed a num b er of metho ds for creating probabilistic forecasts and outlined our metho d- ology for adapting these forecasts for eac h task. W e c hose and dev elop ed these metho ds based on a num ber of c haracteristics including success of the metho ds in similar applications, their compu- tational simplicity and their v ersatility in incorporating the p eriodic nature of the data. W e hav e created several forecasts which p erform well and obtain the lead score in a n umber of tasks. Our forecasts p erformed consistently well, to o. All forecasts b eat the b enchmark with only three of the t welv e submissions not improving on the b enc hmark by at least 40% . Ov erall we obtained ﬁv e top t wo ﬁnishes in the t welv e tasks, with top p osition t wice and second position on three o ccasions. This was the second highest top t wo ﬁnishes amongst all ﬁnal candidates. There are perio dicities in the scores lik ely due to more v ariability in load due to heating and co oling. The b enc hmark and forecast scores are correlated due to this. V ery large b enc hmarks scores are lik ely due to large diﬀerences in weather conditions. In certain tasks (suc h as 3 and 4) 7 The leading submission is the b est submission from all teams for each task. Not to b e confused with the submission of the winning team. 8 all teams scored p o orly . F or example, task 3 w e found that there were very low temp eratures whic h correlated with large forecast errors on the 14 th Decem b er. The strong correlation b et ween the w eather and load demand imply that the biggest single impro vemen t in forecast accuracy will come from b etter long term w eather forecasts. T able 1 illustrates that the h ybrid forecast is the b est scoring ov erall. Ho wev er it is clear that the simple quantile forecast is resp onsible for m uch of this impro vemen t with all forecasts using this metho d scoring very similarly . Although CKD-W2 and QR p erform similarly on a verage, the CKD-W2 only performs b etter than the quan tile regression on a few tasks. On those tasks the diﬀerence is signiﬁcant and therefore the h ybrid forecast reduces this discrepancy . F urther improv emen ts could hav e b een done to further impro ve the scores. There are a num ber of c hanges which may impro ve the basic forecasts (CKD-W, CKD-T, QR) such as including weekda y iden tiﬁers or improving the c hoice of w eather stations. How ev er the simplest mo diﬁcation w e could mak e is to improv e the weigh ts used in the h ybrid forecasts. In particular w e could train the weigh ts on a rolling basis from one da y to the next. This mean that the most recen t (and accurate) weigh ts could b e applied, and p oten tially we could even forecast such weigh ts. In this pap er we hav e rep orted a simple com bination of our tw o forecast metho ds to create a hybrid forecast. It has b een sho wn that a simple linear combination is not optimal since, even if the forecasts are prop erly calibrated, the ﬁnal forecast will not b e [8]. Hence we could also consider other metho ds, for example the b eta linear p ool metho d as described in [8]. A surprising for us result of the competition w as the success of v ery simple metho ds. The quantile regression, whic h only mo delled the trend and y early seasonalit y , w as one of our, and the comp etitions’, best p erforming forecasts. Such metho ds could th us b e used as b enchmarks for more complicated metho ds. References [1] S. Arora and J. W. T a ylor. F orecasting electricity smart meter data using conditional k ernel densit y estimation. Ome ga (In Pr ess) , 2015. URL: http://doi.org/10.1016/j.omega.2014. 08.008 . [2] N. Charlton and C. Singleton. A reﬁned parametric mo del for short term load forecasting. Int. J. F or e c asting , 30:364–368, 2014. [3] V. Chernozhuk o v, I. F ernandez-V al, and A. Galic hon. Quantile and probabilit y curv es without crossing. Ec onometric a , 78:1093–1125, 2010. [4] T. Hong, P . Pinson, S. F an, H. Zareip our, A. T ro ccoli, and R. J Hyndman. Probabilistic energy forecasting: State-of-the-art 2015. Int. J. F or e c asting , 2015. [5] J. Jeon and J. W. T aylor. Using conditional k ernel density estimation for wind pow er density forecasting. J. Am. Statist. Asso c. , 107:66–79, 2012. [6] R. Koenker and G. Bassett Jr. Regression quantiles. Ec onometric a , 46:33–50, 1978. [7] Y. Liu, M.C. Rob erts, and R. Sioshansi. A vector autoregression weather mo del for electricit y supply and demand mo deling. J. F or e c asting (submitte d) , 2014. URL: http://ise.osu.edu/ isefaculty/sioshansi/papers/weather_var.pdf . [8] R. Ranjan and T. Gneiting. Combining probabilit y forecasts. J. R. Statist. So c. B , 72:71–91, 2010. 9

A hybrid model of kernel density estimation and quantile regression for GEFCom2014 probabilistic load forecasting

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment