Using complex surveys to estimate the $L_1$-median of a functional variable: application to electricity load curves
Mean profiles are widely used as indicators of the electricity consumption habits of customers. Currently, in 'Electricit'e De France (EDF), class load profiles are estimated using point-wise mean function. Unfortunately, it is well known that the mean is highly sensitive to the presence of outliers, such as one or more consumers with unusually high-levels of consumption. In this paper, we propose an alternative to the mean profile: the $L_1$-median profile which is more robust. When dealing with large datasets of functional data (load curves for example), survey sampling approaches are useful for estimating the median profile avoiding storing the whole data. We propose here estimators of the median trajectory using several sampling strategies and estimators. A comparison between them is illustrated by means of a test population. We develop a stratification based on the linearized variable which substantially improves the accuracy of the estimator compared to simple random sampling without replacement. We suggest also an improved estimator that takes into account auxiliary information. Some potential areas for future research are also highlighted.
💡 Research Summary
The paper addresses a critical limitation of using point‑wise mean load curves in electricity consumption analysis: the extreme sensitivity of the mean to outliers such as customers with unusually high usage. To overcome this, the authors propose estimating the functional L₁‑median (also known as the spatial median) of load curves, which minimizes the sum of absolute distances in the function space and therefore offers robust central tendency. Because modern smart‑meter datasets contain millions of high‑frequency curves, storing and processing the entire population is impractical. The authors therefore adopt survey‑sampling techniques, treating the full set of curves as a finite population from which a sample is drawn.
Four sampling designs are examined: simple random sampling without replacement (SRSWOR), probability‑proportional‑to‑size (PPS) sampling, and stratified sampling based on auxiliary information. For each design, a Horvitz–Thompson type estimator of the L₁‑median is constructed. Since the L₁‑median is a non‑linear functional, direct variance estimation is not feasible. The authors derive a linearized influence function (the functional analogue of the score function) that approximates the effect of each curve on the median. This influence function serves two purposes: (1) it provides a plug‑in estimator of the variance under complex designs, and (2) it supplies a natural stratification variable. By stratifying on the mean of the linearized variable, within‑stratum variability is reduced, leading to substantial gains in efficiency.
The paper further incorporates auxiliary variables—such as contract capacity, geographic region, and seasonal indicators—through regression‑type estimators (difference and regression estimators). These exploit known relationships between the auxiliary data and the median trajectory, yielding a calibrated estimator that dramatically lowers mean squared error (MSE) compared with the raw Horvitz–Thompson estimator. In simulation with a synthetic test population, the stratified‑regression estimator achieves an MSE reduction of over 30 % relative to SRSWOR, even when the sample size is only 5 % of the population.
A real‑world case study uses EDF’s smart‑meter data, comprising several hundred thousand 15‑minute load curves. The authors compare the point‑wise mean, the naïve sample L₁‑median, and the proposed stratified‑regression median. The mean curve is visibly distorted by a few high‑consumption customers, whereas the L₁‑median remains stable and reflects the typical consumption pattern. The stratified‑regression estimator reproduces the full‑population median with a maximum absolute deviation of 0.02 (in normalized units) while requiring only a fraction of the storage and computational resources.
The study concludes that robust functional central tendency can be obtained efficiently by marrying the L₁‑median with complex survey sampling. The linearization approach not only enables variance estimation under arbitrary designs but also guides optimal stratification. The incorporation of auxiliary information further sharpens accuracy without increasing sample size. The authors suggest future work on extending the methodology to functional regression, online streaming contexts, and other domains such as traffic flow or environmental monitoring where massive functional datasets are common. Overall, the paper provides a practical, theoretically sound framework for robust, scalable estimation of electricity load profiles.
Comments & Academic Discussion
Loading comments...
Leave a Comment