Short Term Electricity Load Forecasting on Varying Levels of Aggregation

00 (2017) 1–19 A Scaling La w for Short T erm Load F orecasting on V arying Le v els of Aggre gation Ra ﬃ Sevlian and Ram Rajagopal 1 2 Abstract This paper proposes a simple empirical scaling law that describes load forecasting accuracy at varying lev els of aggregation. W e show that for many forecasting methods, aggregating more customers improv es the relati ve forecasting performance up to speciﬁc point. Beyond this point, no more improvement in relati ve performance can be obtained. A benchmarking procedure for applying the scaling law to di ﬀ erent forecasting models is presented. The aggre gation model is e v aluated with year long consumption proﬁles of ov er 180 thousand P aciﬁc Gas & Electric customers. A theoretical model based on a bias variance decomposition of the forecast error is used to model the Aggr e gation Error Curves (AECs) that are empirically e xplored. K e ywor ds: 1. Introduction T o meet the challenges posed by a signiﬁcant increase in distributed ener gy resources, there is a growing need for activ e control at the distribution lev el. The deployment of improved sensing and control technologies allo w a v ariety of applications in the distribution system. As an example, smart meter data has been proposed to be used in various planning and operational applications. In applications such as topology processing or state estimation, load forecasts of 1 hour up to a day ahead are needed to provide pseudo-measurements. For the design of the next generation of distribution system applications, understanding the variability of these pseudo-measurements is of importance. The focus of this paper is the following: relying on empirical analysis and theoretical modeling we dev elop an intuitive scaling law for load forecasts on v arying le v els of aggregation. The ﬁeld of load forecasting is very mature with numerous methodologies having been proposed throughout the years. These works have focused primarily on the lev el of lar ge substations servicing tens of mega watts or up to an entire transmission system which has a load of tens of gigaw atts. W ith recent advances in communication infrastructure for remote measurement and automated metering, there is an abundance of new data from homes and commercial buildings. The proliferation of this more granular data has led to an increase in forecasting research at these lo wer le vels aggregation. T ypical home loads are 1 to 2 kWh, while commercial b uildings can be 100 times that amount. The relative forecasting errors typically seen at the lev el of substations and power systems has been quite low (1% − 2%) while forecasting performance at the individual le vel sho w much higher errors (up to 30%). Clearly , there is a discrepancy in forecasting performance at the lev el of indi viduals and of the entire power grid. This is the e ﬀ ect of load aggr e gation on forecasting performance. Our work aims to quantify the e ﬀ ect of aggregation by proposing a scaling law relating kWh load level and forecasting error . Using data from close to tw o hundred thousand residential customers and commercial buildings, we construct datasets of varying aggregation lev els and verify a proposed aggre gation model via an Aggr egation Err or Curve . 1 / 00 (2017) 1–19 2 A theoretical aggregation scaling law is obtained by assuming a simple underlying consumption model for each individual then decomposing the aggregate error into bias and v ariance terms. The theoretical scaling law is then ﬁt to the experimental AEC for various forecasting methods, and horizons. It is shown that the scaling law holds for a wide range for forecasting methods, prediction horizons and data types. W e propose that this AEC can be used to benchmark v arious proposed forecasting procedures in smart grid appli- cations. Since any load forecasting done for smart grid applications on the distribution system will require small to moderate aggregation of loads as the basic unit analysis, having an AEC is helpful to understand the performance of a forecasting method on a wide yet practical range of loads that will be encountered in practice. W e should also note that while using o ﬀ the shelf forecasting techniques with aggre gation can impro v e forecasting accuracy , de v elopment of custom ﬁnely tuned forecasting methods will almost always outcompete this proposed strate gy . This benchmarking is mer ely a tool to understand r oughly how forecasters should behave in di ﬀ er ent aggr egation levels, and not make claims against any speciﬁc for ecasting method. The paper is or ganized as follows. Section 2 provides a revie w of energy forecasting at short time scales. Section 3 dev elops a model for aggregation and proposes a set of scaling laws for two common forecasting metrics. The scaling laws are then v eriﬁed in Sections 4 and 5. 2. Literature Re view Here we survey both general methods in short term forecasting in Section 2.2 as well as speciﬁc work which work in forecasting intermediate sized aggregates. W e refer to load forecasting as point forecasts of expected value and not probabilistic forecasts. A revie w of probabilistic forecasting is done in [1]. 2.1. Short T erm Load for ecasting A general overvie w of short term load forecasting state of the art is provided in [2], [3]; and more classic surveys are giv en in [4], [5] and [6]. W e revie w a few popular methods that we utilize as procedures in this paper . Seasonal ARMA and other linear modeling approaches are considered in [7]. Seasonal V ector modeling with segment identi- ﬁcation is considered ﬁrst in [8]. Neural networks have been applied to load forecasting for quite some time. Some early papers are [9], [10], [11], [12]. In [13], the author pro vides a comprehensi v e survey of the current state of Neural networks applied to load forecasting. Support vector regression has recently been applied to load forecasting as well with substantial work done by [14] and [15]. 2.2. Hour Ahead F orecasting on Individual and Lar ge Aggr e gates Recent work shows a fundamental limitation to the predictability of indi vidual customers. [16] performs one hour ahead forecasting based on hourly data utilizing machine learning. The methods achiev e a relativ e error of 1 . 61% to 13 . 41% for a 700 kWh commercial building and between 15% to 30% for three homes with mean consumption close to 1 . 5 kWh. In [17], machine learning methods are compared on data from three homes with mean consumption 1 to 2 kWh achieving relativ e errors close to 25%. In [18], various methods are utilized to forecast peak demand for individual homes. The authors conclude that seasonal autoregressi ve models achiev e the best performance, with relativ e error of 30%. In [19], a Kalman ﬁlter based forecaster is applied to single home data with mean consumption of 0 . 8 kWh to achiev e an error of 30%. Low relativ e errors are reported at high aggregation lev els. In [20] the authors use an artiﬁcial neural network to forecast a mean load of 2 . 5 GWh, with errors ranging from 1 . 73% to 3 . 02%. In [21] the authors apply wav elet multi-scale decomposition based autoregressi ve approaches. They report values of 0 . 7% to 3 . 5% depending on the method used on a dataset with mean load 9 GWh. In [22] artiﬁcial neural networks are applied to data with a mean consumption of 800 MWh achie ving errors from 1 . 11% to 1 . 63%. Similarly , [23] obtains an error in the range 0 . 81% to 1 . 21% utilizing neural networks on a 8 GWh load data. In [24] a no vel ANN architecture is applied to tw o datasets with peak load 4 . 4 GWh and report error between 0 . 8% and 1 . 5%. Finally , [9] applies artiﬁcial neural networks to attain an error rate of 1 . 7% for a load of 7 GWh. In the experimental comparisons in this paper we benchmark hour ahead forecasting of one hour , multiple hour and day ahead interv als of consumption. 2 / 00 (2017) 1–19 3 2.3. Recent W ork on Ag gr e gation F or ecasting Initial work on dev eloping a model for forecasting accuracy with an explicit scaling law was done by the authors in [25]. The model was limited to 2,000 residential customers and was unable to capture scaling behavior at large aggregate lev els. Other work also rely on small datasets and show similar results as in [25]. In [26] the authors aggregate up to 1000 customers for one hour ahead forecasting and provide a qualitativ e rationale for the e ﬀ ect. In [27] the authors demonstrate an empirical plot of normalized root mean squared error against number of customers and show it decreases. This w ork aggregates up to 782 homes. In [28] the authors sho w that mean absolute percentage error decreases with the number of customers and use it for e xamining electricity market trading performance. In [29] the authors show that clustering can help improv e the forecast accuracy , which follows the ideas proposed in this work. This paper di ﬀ ers from prior work since we extend the aggregation to over 100,000 customers (100 MWh) and point out the crucial point that errors no longer improve beyond a critical load. W e then propose an additive load shape based consumption model based results from smart meter clustering analysis [30]. This model is used to deriv e a benchmarking formula that describes the relationship between relative error and the aggregation size. The model is ﬁt to experimental performance data of several forecasters to uncover aggregation relationships. T o the best of our knowledge, this is the ﬁrst paper modeling forecasting error scaling with aggregation size. This work connects individual consumption models to aggregate forecasting and provides a simple mechanism to benchmark any forecasting algorithm as it is applied to v arying le v els of aggregation. 3. Modeling Load Aggregation 1 user 5 users 20 users 40 users 60 users 80 users (a) 1 user 5 users 20 users 40 users 60 users 80 users (b) Figure 1. Hourly electricity consumption for various aggre gation le v els. Consumption pattern of a single customer generally has little structure to be exploited. Aggregating more and more customers ‘smoothes’ the signal so that it can be more predictable. Aggregation level of 20 or more residential customers shows a predictable pattern. Note that plots are not in the same scale. Aggregation reduces the inherent variability in electricity consumption resulting in increasingly smooth load shapes. Figure 1 illustrates this e ﬀ ect where it is clear that the higher aggregation levels are easier to predict. An intuitiv e explanation is that the ‘law of large numbers’ smoothes out the signal, therefore justifying why gigawatt lev el forecasting is very accurate. Y et, it is less clear how to quantify the improvements in forecasting. For example, it is assumed that more aggregation will generally impro ve forecast accurac y due to a 1 √ N smoothing. The main goal of this paper is to develop an scaling law for forecasting performance with respect to aggr e gation size which ﬁts experimental data. In particular, we propose that the mean load be used as a key metric in identifying how forecasting methods perform in very large population averages. W e experimentally demonstrate the intuitiv e smoothing, but show that there is a limit with which aggregation no longer helps improv e forecasting performance. Finally , we propose a simple stochastic process model which describes the experimental Aggreg ate Error Curve. 3.1. F orecast Accur acy P erformance Metrics The main performance metrics most commonly used in forecasting literature are Coe ﬃ cient of V ariation (CV) and Mean Absolute Percentage Error (MAPE). Recently , alternative metrics have been proposed [31], howe ver we will not focus on these. This work focuses mostly on CV , because it is amenable to theoretical analyst. Howe v er , the proposed scaling law is ﬁt to both CV and MAPE and the y are sho wn to hav e identical beha vior . 3 / 00 (2017) 1–19 4 Coe ﬃ cient of variation measures the ratio of the prediction error standard deviation to the signal mean. Consider two time series x ( t ) and its forecast ˆ x ( t ) for t = { 1 , ..., T } . The empirical CV measures the di ﬀ erence between these time series and is computed as C V ( x , ˆ x ) = 100 q 1 T Σ T t = 1 ( x ( t ) − ˆ x ( t )) 2 1 T Σ T t = 1 x ( t ) % . (1) Like wise the mean absolute percentage error (MAPE) is deﬁned as M A PE ( x , ˆ x ) = 100 T T X i = 1      x ( t ) − ˆ x ( t ) x ( t )      % . (2) CV and MAPE are relativ e error metrics traditionally reported in the literature. It is assumed they allow com- parison of performance in di ﬀ erent datasets because they are relativ e metrics. Ho we v er as we show in this work, the relativ e error depends signiﬁcantly on the le v el of load aggregation. 3.2. F orecasting Scaling Laws Consider a set of N customers with consumption giv en by a time-series x n ( t ). The mean consumption for each customer is W n = 1 T P T t = 1 x n ( t ) . W e select a subset A ⊆ { 1 , 2 , . . . , N } of customers and form an aggregation consumption x A ( t ) = X n ∈ A x n ( t ) , (3) with av erage consumption W A = X n ∈ A W n . (4) In load forecasting, we build a predictor for x A ( t ) that outputs the predicted sequence ˆ x A ( t ) and ev aluate CV( x A , ˆ x A ). Suppose such forecaster is b uilt for e v ery such group of customers. Then, under certain conditions on the beha vior of x ( t ) and the forecasting model, we have the follo wing: Theorem 1. Consider all sets A of consumers x A ( t ) with mean consumption W A = W . The average coe ﬃ cient of variation at the W level of aggr egation is given by C V ( W ) = E [ CV ( x A , ˆ x A ) | W A = W ] (5) = r α 0 W + α 1 (6) for constants α 0 and α 1 , wher e the e xpectation is taken o ver all sets A of mean consumption W . Pr oof . See Appendix D.1 Theorem 1 gives a set of su ﬃ cient conditions under which the coe ﬃ cient of variation will conv er ge to (6). The conditions are quite standard, and produce an intuiti v e scaling la w . Howe v er , as will be shown, this law ﬁts the experimental aggregation with some error . W e propose a modiﬁcation of the population a verage CV scales as a function of W following C V ( W ) = r α 0 W p + α 1 (7) to ﬁt the AEC’ s produced for many of the forecasting horizons. The parameter p is used to provide ﬂexibility in ﬁtting experimental aggregation error curves. Note that an ideal aggr e gation occurs when p = 1 and α 1 = 0. In this ideal 4 / 00 (2017) 1–19 5 case, we ha v e the intuitive 1 √ W , improvement that is nai vely assumed to be true. Assuming p = 1, the proposed scaling law se gments the forecasting problem into two re gimes: 1. Scaling : When α 0 / √ W  α 1 , relativ e error improves considerably due to aggregation. Equation (7) can be approximated as CV ( W ) ≈ q α 0 W . 2. Saturation : When α 0 / W  α 1 , there is no improv ement in forecasting from aggregation and CV( W ) ≈ √ α 1 . In addition to CV , we extend our proposed model for MAPE as well. W e ﬁt the population average MAPE scales as a function of W A according to MAPE( W ) = E [MAPE( x A , ˆ x A ) | W A = W ] = r β 0 W p + β 1 . (8) For all of the experiments, both the CV and MAPE aggregation error curves were generated, but the CV based AEC’ s are shown in the ﬁgures, since they correspond to the theoretical analysis in Appendix D.1. 4. Experiment Setup 4.1. Description of Data Day of W eek Energy Consumption [kWh] M T W TH F ST 0123 (a) Mean User Consumption [kWh] # Users 0.5 1.5 2.5 3.5 0 500 1000 1500 (b) Day of W eek Energy Consumption [kWh] M T W TH F ST 0 6 13 22 31 40 49 (c) Mean User Consumption [kWh] # Users 5 10 15 20 25 30 0 200 600 1000 (d) Figure 2. 7 days of consumption for single (a) residential customer and (d) SMB customer . (b) Histogram of mean load for all (b) residential and (c) SMB customers. The data used for our study is provided by Paciﬁc Gas and Electric Company (PG&E). With a customer base of residential (RES) and small to medium businesses (SMB). Residential customer data comes from most of northern California as 1 hour intervals meter reads which are used for billing. The entire dataset represents over 180 thousand users of year long consumption from 2010-08-01 from 2011-07-31. The data represents 408 zip codes around Cali- fornia out of a total of 2 , 597 possible zip codes. The SMB data comprises of 150 thousand consumer proﬁles. The data set represents a full year from 2010-08-01 to 2011-07-31, sampled every 15 minutes. The data represents a wide variety of commercial applications, in many ZIP codes and climate proﬁles. For this paper, the data is temporally aggregated to represent 1 hour interv al data lik e the residential data. The mean consumption of the data is of importance for this work. Figure 2(a) shows a typical one week time series of a residential customer, while Figure 2(b) sho ws the consumption of each residential customer av eraged ov er an entire year . The overall mean consumption is 1 . 05 kWh. Although there is some variation, the maximum mean consumption is less than 4 kWh. The SMB data di ﬀ ers from the residential data in two ways: (1) The mean consumption is generally much larger than a residential customer; (2) the proﬁles are less variable, in terms of intra- day variation. The mean consumption for the SMB data is 8 . 94 kWh, which is close to 9 times that of a residential customer . Figure 2(c) shows the consumption proﬁle of a randomly chosen customer and Figure 2(d) sho ws the histogram of yearly means for all users. 5 / 00 (2017) 1–19 6 4.2. Generating Aggr e gate Consumption The following procedure is used to generate an Aggregation Error Curve for a given population and forecasting procedure. 1. A set of aggregate proﬁles are generated using (3), by randomly sampling the population of proﬁles A . For a ﬁxed array of aggregation sizes, N = { N 1 . . . N ma x } , we choose a size, N then sample without replacement a subset A , where | A | = N . This is done a ﬁxed number of times M . Therefore for any N , and m ∈ { 1 , . . . , M } a single aggregation x A is generated. For each aggreg ate A , the mean consumption is computed according to (4). 2. A forecasting procedure is used to generate ˆ x A is computed for each aggregate signal and each time t . 3. Error metrics are computed with either CV( x , ˆ x ) or MAPE( x , ˆ x ). 4. For each N and m the tuple ( W A , C V ( x , ˆ x )) is recorded. Residential aggregate consumption time series were generated by forming groups of randomly selected customers. Fifty six group sizes were chosen ranging from one to 100,000 customers, the v alues of aggregation sizes are gi ven in Appendix B. Fifty random groups for each size were generated by uniformly selecting customers. The mean hourly consumption of these groups ranged from 1kWh to 100MWh. The largest mean hourly consumption for each size ranged from 3 kWh to 180 MWh. For each generated aggregate load, we generate a weighted average temperature time series for each zip code used in generating an aggregate. SMB aggregate consumption time series are generated in a similar way . Forty three group sizes were selected ranging from one to 50,000 customers. The hourly group mean consumption ranges between 10kWh to 400MWh. The group with largest consumption had a 670 MWh a v erage hourly load. 4.3. F orecasting Models T able 1. Models used in analysis Model Description M 1 SARMA(1 , 0) × (1 , 0) 24 M 2 SARMA(2 , 0) × (1 , 0) 24 M 3 SARMA(3 , 0) × (1 , 0) 24 M 4 SVR - Radial Basis Function M 5 FFNN - Logistic Activ ation Function M 6 Daily T otal (SARMA) + Shape forecast The proposed scaling laws are studied using three commonly used methods for short term load forecasting: Sea- sonal Auto Re gressi ve Moving A verage (SARMA), Support V ector Regression (SVR) and Feed Forw ard Neural Net- works (FFNN). For models M 1 . . . M 5 , a one hour and multiple hour ahead forecasting problem is studied. Model M 6 is used for the full day ahead forecasting experiment. Each model and their training / testing procedures are described in detail in the Appendix C. 5. Experimental Results 5.1. Empirical CV and Aggr egation Le vel First we in vestigate the performance of model M 1 in the one hour ahead prediction task. For each time-series, the corresponding mean load W A , forecast ˆ x A and performance CV( x A , ˆ x A ) were computed. The Aggreg ation Error Curve is the plot of the pairs ( W A , CV( x A , ˆ x A )) for all generated groups A as shown in Figure 3. The scaling law in (7) was ﬁt to this data using a non-linear least-squares procedure detailed in Appendix E. The ﬁt (solid line) and the ideal aggr egation scaling la ws are displayed in Figure 3. Fit parameters are shown in T able 2 for model M 1 and others. 6 / 00 (2017) 1–19 7 Mean Load Wh MAPE 1 k 10 k 100 k 1 M 10 M 100 M 0.1 % 1 % 10 % 100 % Error Fit Ideal T CV Figure 3. ( W A , CV( x A , ˆ x A )) is sho wn in green markers for model M 1 . Best ﬁt (solid line) has p = 0 . 89. Dashed line indicated error scaling with ideal aggregation e ﬀ ect with no irreducible error ( p = 1). The model leads to a critical load, W ? = 2179 kWh, and irreducible error , √ α 1 = 2 . 1. Figure 3 shows visual veriﬁcation that the scaling law can be decomposed into scaling and saturation regimes. The transition point between regimes is deﬁned as the load aggregation lev el W ? where the regime approximations are equal. The critical load W ? is the positiv e solution W to r β 0 W p = p β 1 . (9) The critical load for model M 1 is 2 . 2 MWh. The scaling regime extends from 1 kWh to 2 . 2 MWh aggregate loads, and the saturation re gime extends from 2 . 2 MWh to 100 MWh. Note that any intermediate sized aggregate of loads will fall some where in the scaling regime of the AEC since the scaling regime is sensitiv e to aggregation size. Because of this, understanding how aggregation leads to forecasting improv ement is important in designing applications which require small to medium size aggregations of customers. 5.1.1. Comparison of Di ﬀ er ent Models T able 2. Scaling law ﬁt for CV CV MAPE M p √ α 0 √ α 1 W ? p √ β 0 √ β 1 W ? 1 0.89 (0.83 0.96) 53.8 (50.0 60.7) 2.13 (2.13 2.13) 2179 0.88 (0.83 0.95) 45.4 (50.0 60.7) 1.52 (1.486 1.53) 2250 2 0.92 (0.87 0.98) 53.5 (48.5 58.8) 1.25 (1.20 1.31) 8925 0.96 (0.92 1.05) 43.6 (38.2 51.9) 0.89 (0.858 0.970) 8621 3 0.91 (0.87 0.96) 54.4 (45.9 57.5) 1.19 (1.14 1.23) 11615 0.95 (0.90 1.02) 42.1 (39.6 50.6) 0.81 (0.816 0.820) 16856 4 0.92 (0.81 1.00) 52.9 (48.6 53.8) 1.96 (1.95 2.06) 6089 1.07 (0.99 1.15) 56.7 (41.1 55.4) 1.21 (1.081 1.913) 23127 5 0.92 (0.85 0.99) 55.8 (49.3 59.9) 1.33 (1.13 1.53) 72218 0.94 (0.83 1.05) 45.3 (37.0 53.6) 1.42 (1.120 1.853) 1539 In this section we validate that the scaling law holds for the models in Section 4.3. The scaling law parameters also provide a way to compare the performance of these di ﬀ erent models. For the models in T able 1 the resulting scaling law ﬁts for MAPE and CV are gi v en in T able 2. The relativ e variation of the parameters √ α 0 and √ β 0 are quiet small between di ﬀ erent models. In contrast, the irreducible errors √ α 1 and √ β 1 are much larger di ﬀ erent between the models. As is shown in Appendix D.1, the model implies that √ α 0 and √ β 0 should be identical since they are associated with independent errors. On the other 7 / 00 (2017) 1–19 8 hand √ α 1 and √ β 1 capture the true forecasting performance. Combining these two important observ ations leads to identifying the irreducible errors as a fundamental performance metric for model comparison. Using this metric, we see that with su ﬃ cient training, the SVR and FFNN models perform quite well. These models show irreducible CV errors of 1 . 961% and 1 . 338%, respectiv ely and MAPE errors of 1 . 210% and 1 . 423%. W e should note howe v er , that these models take considerably more training since a number of running parameters are ﬁt in the validation step prior to a single test sample is ev aluated. Additionally , over a large sample population the linear models outperform these more computationally intensiv e models. The critical load v alue W ∗ can be compared between di ﬀ erent models. It can be seen to depend almost e xclusi vely on the irreducible error since the reducible errors are close to each other . This observation leads to the conclusion that for ecasters with low irr educible err or beneﬁt mor e fr om ag gr e gation . For example, Model M 3 has critical load of 16 MWh and it’ s saturation regime is in the far right of the AEC. T able 2 shows the ﬁtted values as well as 95% conﬁdence intervals which are computed by bootstrap resampling the experimental points [32]. T wo ke y points are shown by the conﬁdence interv al: 1. The conﬁdence intervals of √ α 0 and √ β 0 are quite wide and intersect in the intervals √ α 0 ∈ [50 . 0 , 53 . 8] and √ β 0 ∈ [50 . 0 , 50 . 6]. This means that for example, a null hypothesis, halfway between the ranges, of √ α 0 = 52 . 4 and √ β 0 = 50 . 3 identical over all models would not be rejected. This validates the analytical model relating this reducible term to a consumption proﬁle independent parameter . A somewhat contr oversial conclusion fr om this is that most for ecasting methods modeling data at small aggr egation levels are mer ely overﬁtting random noise. 2. The value’ s of p for each ﬁt are interesting since for an indi vidual dataset, setting p = 1 leads to an unseemly ﬁt. Howe ver , for half of the models, the 95% conﬁdence interv al contains the v alue p = 1. Therefore we conclude, that for a gi ven dataset, generating an accurate ﬁt of the aggregation-error curv e requires a p , 1. Howe v er , we should keep in mind that there is no model basis for this parameter . 5.2. Linear Scale Observations 1 5 20 100 400 800 2000 0 20 40 60 80 100 Group Size MAPE CV (a) log(Group Size) log(MAPE) lo g (1) log (10) log (1000) l og (3.3) l og (91) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Error Fit Theoretical T T CV (b) Figure 4. CV boxplot of randomly generated groups of N customers in linear (a) and log scale (b). Here we in vestigate the e ﬀ ect of aggregation in a linear scale and small sample sizes to sho w how the proposed model de viates from common understanding of aggregation smoothing. For example, analysis of 2000 custo mers was presented in [25] and with 200 customers in [28]. Such limited group sizes are unable to identify the various regimes present in the scaling law , thus an incomplete understanding. Most studies that have shown similar results to this work hav e relied on a few data points and have shown that relativ e error decreases as the aggregation size has increased. All prior work ha v e shown a linear scale decrease which 8 / 00 (2017) 1–19 9 has ﬁt an intuition of the “law of large numbers”. It should be stressed that the analysis in the log scale, such as in Figure 3, p ro vides a di ﬀ erent intuition than a linear scale analysis. For example, the log scale analysis visually highlights the existence of a critical load and irreducible error . This is not noticeable in linear scale analysis with small sample sizes. The same linear CV plot is replotted in a log scale Figure 4(b). Notice that ev en though a rough “diminishing returns” is assumed in Figure 4(a), it is not seen in Figure 4(b). The errors are still decreasing at a constant rate, which is noticeable only in a log-log plot and only for larger aggre gation le v els. 5.3. Day Ahead and Multiple Hour F orecasting Here we present forecasting Aggregation Error Curves for both multi-hour horizon and complete day ahead fore- casting problem. W e show that that the model of aggregation is consistent regardless of forecasting horizon if the same method is used ov er all time series. log(Mean Load) log(MAPE) 1 kWh 10 kWh 1 MWh 0.1 % 1 % 10 % 100 % 1 Hour Ahead 2 Hour Ahead 3 Hour Ahead 4 Hour Ahead No Bias CV (a) Mean Wh CV 1k 10k 100k 1M 10M 100M 5 % 10 % 30 % 50 % 100 % (b) Figure 5. (a) AEC for multiple hours ahead using model M 3 . (b) AEC for day ahead forecaster M 6 . 5.3.1. Multiple Hour F orecasting Figure 5(a) shows the model M 3 used for various hour ahead forecasting problems. The scaling law is ﬁt to the data, with ﬁt parameters giv en in T able 3. It is clear that the irreducible error increases with forecasting horizon, reducing the beneﬁt from aggregation. It indicates that for more complex tasks such as day ahead forecasting, the forecasters need to be designed carefully to achiev e lo w irreducible error . T able 3. Scaling law ﬁt for multiple horizons Horizon CV MAPE (hours ahead) √ α 0 √ α 1 √ β 0 √ β 1 1 72.5 1.28 56.3 1.06 2 85.2 5.30 75.1 3.26 3 50.2 8.92 86.6 7.35 4 68.3 10.94 78.6 8.11 9 / 00 (2017) 1–19 10 5.3.2. Day Ahead F orecasting The aggregation error curve for randomly generated loads using model M 6 is shown in Figure 5(b). Using our proposed model, we hav e the follo wing approximate model CV( W ) = r 3562 W + 41 . 9 . (10) The ﬁt exponent p = 1 . 01, with the 95% conﬁdence interv al containing p = 1 while the reducible error is √ 3562 = 59 . 6. This leads to an irreducible error of 6 . 47% for a full day ahead forecaster and a critical load of W ? = 85 kWh, or around 80 homes. This value may be misleading, since from Figure 5(b), it appears as though forecast performance ﬂattens our after 1 MWh of load corresponding to 1000 homes. Here we can note that many forecasting methods sho wn in the literature clearly outperform the 6 . 47% value since they are ﬁnely tuned to a small dataset. When the static forecasting model is applied to a lar ge number of randomly generated time series, the average performance is degradated. 5.4. Small and Medium Business (SMB) Data Analysis The analysis in Section 5.1 is extended to the SMB dataset by computing the CV scaling law for model M 3 . The obtained parameters are √ α 0 = 46 . 67, √ α 1 = 0 . 92 and p = 0 . 82. The scaling law can be compared to the residential dataset scaling law for the same model. The AEC for model M 3 are shown in Figure 6(a) and are very close to those obtained for the same model used in the residential dataset. Notice the scaling law is quite consistent despite the mean loads of residential and SMB data di ﬀ ering by an order of magnitude. This observ ation v alidates the choice of kWh average to dri v e the scaling rather than the number of customers. An intuitiv e interpretation of similarity in the scaling law is that ev ery building (residential or SMB) consumes electricity as a series of tasks of similar a verage sizes. Larger b uildings can be thought of as an aggre gation of smaller b uildings, so the number of tasks scale linearly and so does av erage consumption. This leads to kWh providing the proper scaling for forecasting. Finally , we note that for the SMB data the critical load is close to 10 MWh. Forecasting studies in commercial buildings (mean loads 100 kWh to 1 MWh) need to consider the improvements due to aggre gation when compared to each other . 5.5. Subpopulation Comparison Mean Load Wh M AP E 1 k 100 k 10 M 400 M 1 % 10 % 100 % RES SMB CV (a) Wh CV 1 10 100 1 K 10 M 1 % 10 % 100 % (b) Wh CV 1 10 100 1 K 10 M 80 M 1 % 10 % 100 % (c) Figure 6. Application of aggregation benchmarking procedure for forecaster M 3 (b) coastal population (b) inland population for PG & E coverage territory . The benchmarking design in Section 5.1 is performed on the PG & E climate zone populations. First, the total population is split between the inland and coastal customers. Then the aggregates are randomly generated from each climate subpopulation. The climate subpopulation is separated by the climate zone that each customers zip 10 / 00 (2017) 1–19 11 code falls into. Following the California PG & E climate zone designations [33], we deﬁne “coastal populations” as those in climate zones 1, 2, 3, 4 and “inland populations” as those in climate zones 11, 12, 13, and 14. The total mean consumption av ailable for the coastal and inland climate zone are 50 MWh and 80 MWh respectiv ely . This corresponds to 43,558 coastal customers and 72,558 inland customers. W e then generate aggregates ranging from single users to 40 thousand users for the coastal customers and 65 thousand for the inland customers. Figure 6 sho ws the aggregation error curve for M 3 applied to the inland and coastal users. The results show a similar aggregation error curve under sub-population breakdo wn. T able 4. Aggregation Error Curve Analysis for Coastal Populations (C.P .) and Inland Populations (I.P .) Pop. Model p √ α 0 √ α 1 C.P . M 1 0.91 (0.86 1.05) 63.76 (58.1 68.6) 2.48 (2.45 2.50) M 2 0.96 (0.92 1.02) 61.14 (57.1 66.5) 1.43 (1.38 1.49) M 3 0.90 (0.82 0.93) 61.90 (57.6 66.8) 1.39 (1.34 1.46) I.P . M 1 0.97 (0.94 1.01) 47.06 (44.33 50.17) 1.94 (1.90 1.99) M 2 0.95 (0.90 1.03) 46.32 (43.29 50.09) 1.35 (1.26 1.45) M 3 0.94 (0.89 0.98) 46.47 (42.93 49.95) 1.31 (1.22 1.40) T able 4 shows the three linear models applied to each set of aggre gates. The linear models were chosen since the y were much less computationally expensi ve to train and test. The results indicate that in both climate zones, model M 3 outperforms the other linear models. Also, the inland dataset has lower irreducible error in the inland dataset as opposed to the coastal dataset. This is the case ev en though the coastal dataset has a higher maximum mean load. Since the critical loads are 4 . 8 MWh and 6 . 2 MWh for inland and coastal populations, there are enough samples in the irreducible regime. 5.6. Robustness of the Scaling Law Mean Load Wh MAPE 1 k 10 k 100 k 1 M 10 M 100 M 1 % 10 % 100 % MIN 10 % 25 % 50 % 75 % 90 % MAX Fit CV Figure 7. Comparing quantiles of forecast errors for each aggregation group. The scaling law is robust to the mechanism utilized to generate groups. Here we compare the aggregation error curve for the aggregates with best performance and worst performance at each group lev el. The choice is determined by setting a quantile for CV error at each group size for residential data. Figure 7 displays the result. The scaling law is observed at the di ﬀ erent performance quantiles, thus di ﬀ erent aggregation mechanisms will obtain results similar to that reported in this paper . This indicates that for a general population, the best and the worst sub-groups will have a very close range of relative errors. For this reason, lev el of aggregation is an important parameter to ha ving a-priori understanding of the e xpected coe ﬃ cient of v ariation. 11 / 00 (2017) 1–19 12 6. Conclusions and Future W ork This paper introduces the idea of the e ﬀ ect of aggregation on load forecasting. W e sho w that forecasting accuracy , as measured in relativ e error in terms of MAPE and CV impro v e with larger mean load until a critical load. W e verify this model with empirical experiments and provide su ﬃ cient conditions leading to the observed Aggregation Error Curves introduced in the paper . It is shown that for various time horizons and models, the proposed model ﬁts the empirical AEC with high accuracy . V arious papers focus on new model formulations to describe individual electricity consumers (e.g. [34], [35]). These models can be utilized to justify a detailed understanding of how aggregate consumption patterns are formed and veriﬁed on higher resolution data. Moreov er , novel ideas can be in vestig ated for aggregate forecasting based on models induced by aggregating this individual consumption models. The aggregation phenomena is also likely to be observed in other types of forecasting procedures, such as for example day ahead load forecasting, wind forecasting and electric vehicle av ailability . Determining the scaling parameters for these problems is an important task as it can lead to new concepts on the limitations of forecasting big and small aggre gates. Appendix A. Nomenclature T able A T able with all terms used in article are provided for ease of description. x ( t ) Consumption Time Series. ˆ x ( t ) Forecast T ime Series. CV Coe ﬃ cient of V ariation. MAPE Mean Absolute Percentage Error M k k th Forecasting Model. W Mean Load of Aggregate. W ? Critical Load for Aggregation. p Slope parameter for Empirical Aggregation Error Curv e. α 0 , α 1 Model parameters for CV Aggregation Error Curv e. β 0 , β 1 Model parameters for MAPE Aggregation Error Curv e. RES / SMB Residential Customer / Small and Medium Business Customer . C.P , I.P Coastal / Inland Population. Appendix B. Aggregation Sizes The following aggre gation le v els are used in N = { 1 , 2 , 5 , 10 , 20 , 40 , 50 , 100 , 200 , 350 , 400 , 500 , 600 , 800 , 900 , 1 . 0 × 10 3 , 1 . 5 × 10 3 , 2 . 0 × 10 3 , 2 . 5 × 10 3 , 3 × 10 3 , 3 . 5 × 10 3 , 4 × 10 3 , 4 . 5 × 10 3 , 5 × 10 3 , 5 . 5 × 10 3 , 6 × 10 3 , 6 . 5 × 10 3 , 7 × 10 3 , 7 . 5 × 10 3 , 8 × 10 3 , 8 . 5 × 10 3 , 9 × 10 3 , 9 . 5 × 10 3 , 10 × 10 3 10 . 5 × 10 3 , 15 × 10 3 , 20 × 10 3 , 25 × 10 3 , 30 × 10 3 , 35 × 10 3 , 40 × 10 3 , 45 × 10 3 , 50 × 10 3 , 55 × 10 3 , 60 × 10 3 , 65 × 10 3 , 70 × 10 3 , 75 × 10 3 , 80 × 10 3 , 85 × 10 3 , 90 × 10 3 , 95 × 10 3 , 100 × 10 3 , 105 × 10 3 , 110 × 10 3 , 115 × 10 3 } . Additionally , the samples per aggregation le v el are M = 50. 12 / 00 (2017) 1–19 13 Appendix C. For ecasting models Appendix C.1. Seasonal Auto Re gressive Mo ving A verag e (SARMA): M 1 − M 3 SARMA [36] predicts the electricity consumption in the next time step as a linear function of prior consumption values and forecast errors. Seasonality is considered by including additional predictors at a ﬁxed prior period. A model SARMA( p , q ) × ( P , Q ) s has autoregressi v e (AR) order p and moving average (MA) order q . It uses a seasonal component with a cycle of s time steps, with AR order P and MA order Q . This work considers a restricted class with no MA component so q = 0 and Q = 0. The resulting model for the time-series y ( t ) is y [ t ] = p X k = 1 θ k y [ t − k ] + P X k = 1 φ i y [ t − sk ] +  ( t ) . (C.1) It is usual to assume  ( t ) ∼ N (0 , σ 2 ) is an independent and identically distributed normal variable. The seasonality is set to s = 24 hours and AR order P = 1. The adaptive SARMAX model relearns the parameters θ and φ but keeps the parameters p , P and s ﬁxed. The SARMA model is applied at each time step by learning the linear model using a pre-set model size. This constitutes an adaptive SARMA model. Appendix C.2. Support V ector Regr ession: M 4 Support V ector Regression (SVR) works by building a non-linear learning method for a training dataset { ( x 1 , y 1 ) , . . . , ( x N , y N ) } . The training set comprises of N response y i and predictor x i pairs. The SVR data ﬁtting method solves the following optimization: min w , C ,ζ,ζ ? 1 2 k w k 2 + C l X i = 1  ζ i + ζ ? i  s.t. y i − w T Φ ( x i ) − b ≤  + ζ i i = 1 , . . . , N w T Φ ( x i ) + b − y 1 ≥  + ζ ? i i = 1 , . . . , N ζ ? i ≥ 0 , ζ i ≥ 0 i = 1 , . . . , N Giv en training predictor x i , there are a giv en set of kernel functions Φ ( x i ) which map x i to a high dimensional space. The kernel function is ﬁxed and predictions are computed by: ˆ y i = w T Φ ( x i ) + b . The variables w , Φ , and b are used to map predictors to response. Howe ver , ﬁtting the training data only generates vector w and scalar b subject to a set of constraints. The SVR will solve for w such that it minimizes the sum of the norm of w : 1 2 k w k 2 as well a ﬁtting error C P l i = 1  ζ i + ζ ? i  . The variables  , ζ ? , ζ quantify the ﬁtting error . Any deviation | y i − ( w T Φ ( x i ) − b ) | ≤  incur no penalty . Ho wever , an y deviation outside this dead band ( ζ ? , ζ ) will incur linear cost yielding C P l i = 1  ζ i + ζ ? i  . Under this model, the v alues of C ,  , kernel function Φ and additional parameters to Φ must be speciﬁed. The support vectors, as well as the constants C and  are learned adaptively in each training round. In the training data, we use 3 / 4 of the data for training, and 1 / 4 for validation of the support vector and constants. W e should note that this method prov es computationally expensi ve but outperforms statically trained models and SARMAX models usually . F or a giv en kernel function, cross validation is performed to determine the parameters speciﬁc to the kernel. Then using a moving windo w we train the SVR and forecast one sample ahead as done in the SARMA model. Appendix C.3. F eed F orward Neur al Network (FFNN): M 5 FFNNs (e.g. [23]) provide a popular alternativ e to deﬁne the nonlinear map between x ( t ) and y k ( t − 1) used in the SVR model description. They are subset of artiﬁcial neural networks where neurons with a chosen acti v ation function connect to each other in layers without feedback. The number of neurons, layers , choice of activ ation function and network parameters are learnt from the training set. In this model, the training data is used with the 3 / 4, 1 / 4 split to learn model parameters like in the SVR case. The following subsection presents work (not included in original publication) for full day ahead forecasting. Let x d ∈ R 24 be the aggregate daily consumption for days d = { 1 , . . . , D } . Giv en previous consumption information X d = { x 1 , . . . , x d } and daily temperature forecasts T d = { t 1 , . . . , t d + 1 } the forecaster will output the next day’ s consumption proﬁle ˆ x d + 1 . 13 / 00 (2017) 1–19 14 Appendix C.4. Day Ahead F orecaster: M 6 The forecaster works by predicting the daily total consumption ˆ p d ∈ R and normalized daily shape pattern ˆ u d ∈ R 24 separately . The ﬁnal prediction ˆ x d + 1 = ˆ p d + 1 ˆ u d + 1 is the product of each individual forecast. T otal P ower F or ecaster : An autoregressi ve moving average with exogenous input (armax) model is used to fore- cast the total consumption which is of the form ˆ p d + 1 = d X k = d + 1 − K a k p k + d + 1 X r = d + 1 − K b r t r (C.2) The exogenous input t r ∈ R is the daily mean temperature. The parameters a k , b r ∈ R are determined by least squares regression. A cross validation stage is used to estimate the proper model size K . Shape F orecaster : A vector ARMAX method is used to forecast the daily shape proﬁle. The model is of the form ˆ u d + 1 = d X k = d + 1 − K C k u k + d + 1 X r = d + 1 − K h r t r (C.3) The exogenous input t r ∈ R 24 contains the mean temperature for each hour . The parameters C k , h r ∈ R 24 × 24 are real matrices. These parameters are determined by linear regression giv en the training data using a least squares formulation. The model size K is determined in a cross validation stage. Appendix D. Analytic Model of Aggregation Appendix D.1. Individual Consumption Pr oﬁle and F or ecaster W e use the follo wing notation a complete time series vector x of length T and a daily proﬁle v ector x ( d ) of length 24, and individual element x ( t ). Electricity consumption for individual n is the daily proﬁle vector x ( d ) and is decomposed as x n ( d ) = p n ( d ) +  n ( d ) with the follo wing components. • p n ( d ) is the daily proﬁle shape for an individual is drawn from a distribution of all load shapes. This is based o ﬀ [30], which showed that indi vidual AMI consumption data can be clustered into dictionary of daily shapes. •  n ( d ) is an additive error . W e make the following assumptions on the ﬁrst and second order statistics: (1) zero mean E [  n ( t )] = 0; (2) zero correlated in time E [  n ( t )  n ( t + 1)] = 0; (3) ﬁnite population correlation E [  n ( t )  n 0 ( t )] = γ ; (4) constant v ariance E [  n ( t )  n ( t )] = σ 2 . The individual chooses a daily proﬁle for each day p n ( d ) ∈ R T and deviates from it according to  n . Therefore a dataset spanning man y days is composed of tw o di ﬀ erent stochastic processes: an indi vidual dependent unique shape generation process and a random deviation stochastic process e n . W e do not make assumptions on the shape generating process, since this little work has been done on empirically in vestigating such a stochastic process. W e use the shorthand for forecast ˆ x ( t + 1) = f ( x , M ), which takes as an input the underlying time series to forecast the future horizon, and index ed by the model. Additionally , we use shorthand f N ( M ) = f ( P N n x , M ) to indicate the aggregate forecast under model M . In reality , the function will take in all elements up to time t . 14 / 00 (2017) 1–19 15 Now we can compute C V ( W ), by ﬁrst computing C V ( N ) for a ﬁnite number of aggregate sizes. C ( V ) = lim T →∞ C V ( N , T ) (D.1) = lim T →∞ E x                        1 T P T t  P N n x n ( t + 1) − f N ( M )  2  1 T P T t P N n x n ( t )  2           − 1 2              (D.2) = lim T →∞ E x                        1 T P T t  P N n x n ( t + 1) − f N ( M )  2  1 T P T t P N n x n ( t )  2           − 1 2              (D.3) = lim T →∞ E x                        1 T P T t  P N n p n ( t + 1) +  n ( t + 1) − f N ( M )  2  1 T P T t P N n x n ( t )  2           − 1 2              (D.4) = lim T →∞ E x " 1 T P T t  P N n p n ( t + 1) − f N ( M )  2 + 2 T P T t  P N n  n ( t + 1)   P N n p n ( t + 1) − f N ( M )  + 1 T P T t  P N n  n ( t + 1)  2  1 T P T t P N n x n ( t )  2 ! − 1 2 # (D.5) Here we merely expand the mean squared error term using the decomposition of the time series into the shape proﬁle and additive error term. W e can now analyze each term of the numerator and denominator separately . The denominator term is simply: lim T →∞ 1 T T X t N X n x n ( t ) = lim T →∞ N  µ N , ω N T  a . s . − − → µ N (D.6) This uses the fact that each mean consumption v ector is drawn from the distribution giv en in Figure 2(b) (with variance ω ) so can be approximated by a normal distribution and using the La w of Lar ge Numbers (LLN). The last term in the numerator is the following: lim T →∞ 1 T T X t        N X n  n ( t + 1)        2 = lim T →∞ 1 T T X t  N (0 , N σ 2 + N 2 γ )  2 (D.7) = lim T →∞ 1 T T X t ( N σ 2 + N 2 γ ) χ 2 (D.8) = lim T →∞ N N σ 2 + N 2 γ, N σ 2 + N 2 γ T ! (D.9) a . s . − − → N σ 2 + N 2 γ (D.10) W e should note that the sum of the error terms result in a variance which grows linearly and quadratically . If we assume no correlation across the population, this term will only grow linearly , ho we v er as we will show the irreducible error from the AEC must come from some non-zero quadratic term. The second term of the numerator can be eliminated, since the two terms are uncorrelated and therefore, taking T → ∞ leads to lim T →∞ 2 T T X t        N X n  n ( t + 1)               N X n p n ( t + 1) − f N ( M )        = 0 a . s . (D.11) 15 / 00 (2017) 1–19 16 The ﬁrst term of the numerator is the following: lim T →∞ 1 T T X t        N X n p n ( t + 1) − f N ( M )        2 = N 2 lim T →∞ 1 T T X t        1 N N X n p n ( t + 1) − 1 N f N ( M )        2 | {z } Population Bias δ ( p , M ) 2 (D.12) This represents how well a model can match the average population proﬁle of a particular size giv en a large enough time period. W e refer to it as a bias term because in a large aggregate, it represents how well the normalized proﬁle ﬁts the av erage population proﬁle. Analyzing this via the shape generating process leads to: δ ( p , M ) 2 = lim T →∞ 1 T T X t        1 N N X n p n ( t + 1) − 1 N f N ( M )        2 (D.13) = lim D →∞ 1 D D X d        1 N N X n p n ( d ) − 1 N f N ( M )        2 (D.14) = lim D →∞ 1 D D X d        1 N N X n p n ( d ) − E p        1 N N X n p n ( d )               2 + lim D →∞ 1 D D X d        E p        1 N N X n p n ( d )        − 1 N f N ( M )        2 (D.15) The ﬁrst term in (D.15) represents the variation of the shape proﬁles around their average v alues. This is a vector of length 24 representing the full day long shape. W e can simplify this to: lim D →∞ 1 D D X d        1 N N X n p n ( d ) − E p        1 N N X n p n ( d )               2 a . s . − − → 1 N 2 trace        cov        N X n p n               (D.16) = κ N (D.17) This term should generally scale linearly with aggregation size, since it it representing how shapes vary around their mean shape. In [30], the authors report on similar metrics showing ho w much v ariation a single customer has in choice of load shapes. If we assume this is independent across individuals, then the population variation will decrease in relativ e terms, leading to κ/ N . The second term in (D.15) represents the squared bias between the mean forecast and the mean population shape. If the model has no population bias, E x h 1 N f N ( M ) i = 1 N P N n p n ( d ), and the ov erall term will ha ve O  1 N  behavior just as before. This might happen (although not sho wn here) if each consumer chooses load shapes according to a steady state distribution, then it is imaginable that an unbiased estimator can be constructed giv en previous information since it will be the weighted sum of load shapes. Howe v er , if such a model does not exist which captures the shape generating process of a customer , E x " 1 N f N ( M ) # , 1 N N X n p n ( d ) , (D.18) then (D.15) will always be some positi ve non-negati ve value. An interpr etation of this term is that it captures how a model can ﬁt the pr oﬁle generating pr ocess of the population. This also captures why some forecasters perform better than others. Better forecasters capture the shape generating p rocess better than simpler models which will ha ve some large population bias. If we assume the bias term is forecasting model speciﬁc, then δ ( p , M ) = δ ( M ). 16 / 00 (2017) 1–19 17 Combining these terms we hav e the follo wing: C V ( N ) = lim T →∞ C V ( N , T ) = E p          s N 2 δ ( M ) + N ( κ + σ 2 ) + N 2 γ N 2 µ 2          = E p          s δ ( M ) + γ µ + κ + σ 2 N          . If we assume δ ( p , M ) = δ ( M ) giv en a large p , where the population bias depends only on the nature of the forecasting method to learn the shape process, we hav e the result: C V ( N ) = s δ ( M ) + γ µ + κ + σ 2 µ N (D.19) Finally , from our large sample limit, W = µ N , leading to the desired result. Appendix D.2. Discussion Notice that in the previous section, much of what we did was use an an intuitiv e load shape model for loads, and impose mild conditions on the generating process and additive error such that they lead to the linear and quadratic growth. Given only this model formulation, a priori, it is di ﬃ cult to arrive at any satisfactory conclusion of wether any of the terms should actually be non-zero. Ho wev er , given that the empirical AECs show a non-zero irreducible error , we are forced to accept one of the three possibilities: (1) there is non-zero correlation between additiv e errors; (2) there is a ﬁnite bias term between a forecasting procedure the mean load shape; (3) both (1) and (2) are true. Under this model, the term α 0 = κ + σ 2 µ N and is therefore independent of the forecasting procedure. This is somewhat controv ersial since more forecasting work at small aggre gates are basically swamped by this noise. Appendix E. Scaling Law Parameter Estimation T o estimate the parameters to model (6), we assume the follo wing observ ation model: C V ( W ) = r α 0 W p + α 1 +  (E.1) where  is a zero mean perturbation. This is recovered from the AEC curve by transforming the observations C V ( W ) → C V 2 ( W ) and performing a linear regression on: C V 2 ( W ) = α 0 W p + α 1 +  0 (E.2) where no w  0 is a transformed error term. Giv en the regression solution and the estimated MSE( p ), the e xponent ﬁt is estimated via p ? ∈ arg min p ∈ [ p , p ] MSE( p ). An important issue is that of possible bias or high variance in estimating the parameter p , α 0 and α 1 . In simulation with known values of p and the estimation procedure (E.2), an example of MSE( p ), ˆ p and p tr ue are shown in Figure 8(a). The results show that for a large sample size 50 × 56 ≈ 2500 there is some error in the estimate. Regardless ov er 100 test runs under a single p tr ue , the estimator con ver ges to the correct value for all p tr ue as shown in Figure 8(b). Like wise, in estimating the regression coe ﬃ cients, with α 0 , tr ue = 50 and α 1 , tr ue = 1 . 1 the non-linear regression procedure recov ers the underlying parameters without signiﬁcant bias as sho wn in Figure 8(c), 8(d). 17 / 00 (2017) 1–19 18 0.90 0.95 1.00 1.05 1.10 0.02 0.06 0.10 0.14 T rue Exponent Fit Error p−true p−hat MSE (a) ● ● ● ● ● ● ● ● ● ● 0.95 0.97 0.99 1.01 1.03 1.05 0.94 0.96 0.98 1.00 1.02 1.04 1.06 p−true p−hat (b) ● ● ● ● ● ● ● ● ● ● 0.95 0.97 0.99 1.01 1.03 1.05 48 49 50 51 p−true alpha−0 (c) ● ● ● ● ● ● ● ● ● 0.95 0.97 0.99 1.01 1.03 1.05 1.0997 1.0999 1.1001 1.1003 p−true alpha−1 (d) Figure E.8. (a) Example of ﬁtting the load exponent p in simulated dataset, with p true = 0 . 94 and ˆ p = 0 . 968. Boxplot of estimated (b) ˆ p , (c) α 0 and (d) α 1 values for each Monte Carlo test along with the o verall a verage o ver range of p true . References [1] T . Hong, S. Fan, Probabilistic electric load forecasting: A tutorial revie w , International Journal of Forecasting 32 (3) (2016) 914 – 938. doi:http://dx.doi.org/10.1016/j.ijforecast.2015.11.011 . [2] T . Hong, P . Pinson, S. Fan, Global energy forecasting competition 2012, International Journal of Forecasting 30 (2) (2014) 357 – 363. doi:http://dx.doi.org/10.1016/j.ijforecast.2013.07.001 . URL http://www.sciencedirect.com/science/article/pii/S0169207013000745 [3] T . Hong, Short term electric load forecasting, Ph.D. thesis, North Carolina State Univ . (2010). [4] G. Gross, G. Francisco, Short-term load forecasting, Proceedings of the IEEE 75 (12) (1987) 1558–1573. [5] R. W eron, Modeling and forecasting electricity loads and prices: A statistical approach, V ol. 403, Wiley . com, 2007. [6] C. Harris, Electricity markets: pricing, structures and economics, V ol. 565, W iley Publishing, 2011. [7] A. Papale xopoulos, T . Hesterberg, A re gression-based approach to short-term system load forecasting, IEEE T ransactions on Power Systems 5 (4) (1990) 1535–1547. [8] M. Espinoza, C. Joye, R. Belmans, B. D. Moor, Short-term load forecasting, proﬁle identiﬁcation, and customer segmentation: A methodology based on periodic time series, IEEE T ransactions on Power Systems 20 (3) (2005) 1622–1630. doi:10.1109/TPWRS.2005.852123 . [9] K. Lee, Y . Cha, J. Park, Short-term load forecasting using an artiﬁcial neural network, IEEE Transactions on Po wer Systems 7 (1) (1992) 124–132. [10] G. Zhang, B. Eddy Patuwo, M. Y Hu, Forecasting with artiﬁcial neural networks:: The state of the art, International journal of forecasting 14 (1) (1998) 35–62. [11] A. Bakirtzis, V . Petridis, S. Kiartzis, M. Alexiadis, A. Maissis, A neural network short term load forecasting model for the greek power system, IEEE T ransactions on Power Systems 11 (2) (1996) 858–863. [12] C. Lu, H.-T . W u, S. V emuri, Neural netw ork based short term load forecasting, IEEE T ransactions on Power Systems 8 (1) (1993) 336–342. [13] H. Hippert, C. Pedreira, R. Souza, Neural networks for short-term load forecasting: a review and ev aluation, IEEE T ransactions on Po wer Systems 16 (1) (2001) 44–55. doi:10.1109/59.910780 . [14] P . Pai, W . Hong, Forecasting regional electricity load based on recurrent support vector machines with genetic algorithms, Electric Power Systems Research 74 (3) (2005) 417 – 425. doi:http://dx.doi.org/10.1016/j.epsr.2005.01.006 . URL http://www.sciencedirect.com/science/article/pii/S0378779605000702 [15] P . Pai, W . Hong, Support vector machines with simulated annealing algorithms in electricity load forecasting, Energy Con version and Man- agement 46 (17) (2005) 2669 – 2688. doi:http://dx.doi.org/10.1016/j.enconman.2005.02.004 . URL http://www.sciencedirect.com/science/article/pii/S0196890405000506 [16] R. E. Edwards, J. Ne w , L. E. Park er , Predicting future hourly residential electrical consumption: A machine learning case study , Energy and Buildings 49 (2012) 591–603. [17] H. Ziekow , C. Goebel, J. Strker , H. A. Jacobsen, The potential of smart home sensors in forecasting household electricity demand, in: 2013 IEEE International Conference on Smart Grid Communications (SmartGridComm), 2013, pp. 229–234. [18] R. P . Singh, P . X. Gao, D. J. Lizotte, On hourly home peak load prediction, in: 2012 IEEE International Conference on Smart Grid Commu- nications (SmartGridComm), 2012, pp. 163–168. [19] M. Ghofrani, M. Hassanzadeh, M. Etezadi-Amoli, M. Fadali, Smart meter based short-term load forecasting for residential customers, in: North American Power Symposium (N APS), 2011, IEEE, 2011, pp. 1–5. [20] G. Adepoju, S. Ogunjuyigbe, K. Alaw ode, Application of neural network to load forecasting in nigerian electrical power system, The Paciﬁc Journal of Science and T echnology 8 (1) (2007) 68–72. [21] D. Benaouda, F . Murtagh, J. Starck, O. Renaud, W avelet-based nonlinear multiscale decomposition model for electricity load forecasting, Neurocomputing 70 (1) (2006) 139–154. [22] T . Senjyu, H. T akara, K. Uezato, T . Funabashi, One-hour -ahead load forecasting using neural network, IEEE T ransactions on Power Systems 17 (1) (2002) 113–118. [23] A. Al-Shareef, E. Mohamed, E. Al-Judaibi, One hour ahead load forecasting using artiﬁcial neural network for the western area of saudi arabia, International Journal of Electrical Systems Science and Engineering 1 (1) (2008) 35–40. [24] I. Drezga, S. Rahman, Short-term load forecasting with local ann predictors, IEEE T ransactions on Po wer Systems 14 (3) (1999) 844–850. 18 / 00 (2017) 1–19 19 [25] R. Sevlian, R. Rajagopal, V alue of aggregation in smart grids, in: 2013 IEEE International Conference on Smart Grid Communications (SmartGridComm), 2013, pp. 714–719. doi:10.1109/SmartGridComm.2013.6688043 . [26] P . Mirowski, S. Chen, T . Kam Ho, C.-N. Y u, Demand forecasting in smart grids, Bell Labs T echnical Journal 18 (4) (2014) 135–158. doi:10.1002/bltj.21650 . URL http://dx.doi.org/10.1002/bltj.21650 [27] S. Humeau, T . W ijaya, M. V asirani, K. Aberer, Electricity load forecasting for residential customers: Exploiting aggregation and correlation between households, in: Sustainable Internet and ICT for Sustainability (SustainIT), 2013, 2013, pp. 1–6. doi:10.1109/SustainIT. 2013.6685208 . [28] P . D. Silva, D. Ilic, S. Karnouskos, The impact of smart grid prosumer grouping on forecasting accurac y and its beneﬁts for local electricity market trading, IEEE T ransactions on Smart Grid 5 (1) (2014) 402–410. doi:10.1109/TSG.2013.2278868 . [29] F . L. Quilumba, W . J. Lee, H. Huang, D. Y . W ang, R. L. Szabados, Using smart meter data to improve the accuracy of intraday load forecasting considering customer behavior similarities, IEEE T ransactions on Smart Grid 6 (2) (2015) 911–918. [30] J. Kwac, J. Flora, R. Rajagopal, Household energy consumption segmentation using hourly data, IEEE T ransactions on Smart Grid 5 (1) (2014) 420–430. doi:10.1109/TSG.2013.2278477 . [31] S. Haben, J. W ard, D. V . Greetham, C. Singleton, P . Grindrod, A new error measure for forecasts of household-le v el, high resolution electrical energy consumption, International Journal of Forecasting 30 (2) (2014) 246 – 256. doi:http://dx.doi.org/10.1016/j.ijforecast. 2013.08.002 . URL http://www.sciencedirect.com/science/article/pii/S0169207013001386 [32] B. Efron, The jackknife, the bootstrap and other resampling plans, SIAM, 1982. [33] P . Gas, Electric, P aciﬁc energy center’ s guide to: California climate zones (2012). [34] O. Ardakanian, S. Keshav , C. Rosenberg, Markovian models for home electricity consumption, in: Proc. A CM SIGCOMM Green Networking W orkshop, 2011, pp. 31–36. [35] Z. J. K olter , T . Jaakkola, Approximate inference in additiv e factorial hmms with application to energy disaggregation, in: International Conference on Artiﬁcial Intelligence and Statistics, 2012, pp. 1472–1482. [36] G. Box, G. M. Jenkins, G. C. Reinsel, T ime series analysis: forecasting and control, W iley Publisher , 2013. 19

Short Term Electricity Load Forecasting on Varying Levels of Aggregation

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment