Predictive Modeling of Power Outages during Extreme Events: Integrating Weather and Socio-Economic Factors

Predictive Modeling of Power Outages during Extreme Events: Integrating Weather and Socio-Economic Factors
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper presents a novel learning based framework for predicting power outages caused by extreme events. The proposed approach targets low-probability high-consequence outage scenarios and leverages a comprehensive set of features derived from publicly available data sources. We integrate EAGLE-I outage records from 2014 to 2024 with weather, socioeconomic, infrastructure, and seasonal event data. Incorporating social and demographic indicators reveals patterns of community vulnerability and improves understanding of outage risk during extreme conditions. Four machine learning models are evaluated, including Random Forest (RF), Graph Neural Network (GNN), Adaptive Boosting (AdaBoost), and Long Short-Term Memory (LSTM). Experimental validation is performed on a large-scale dataset covering counties in the lower peninsula of Michigan. Among all models tested, the LSTM network achieves higher accuracy.


💡 Research Summary

The paper introduces a comprehensive learning‑based framework for forecasting power outages caused by extreme weather events, with a particular focus on high‑impact, low‑probability (HILP) scenarios. Using the publicly available EAGLE‑I outage dataset spanning November 2014 to December 2024 for Michigan’s lower peninsula, the authors integrate four major data streams: (1) hourly outage counts per county, (2) hourly weather variables from the Open‑Meteo API (temperature, precipitation, wind speed, wind gust, short‑wave radiation, relative humidity, cloud cover, surface pressure), (3) socio‑economic indicators from the 2021 American Community Survey (average household income, unemployment rate, age distribution of residential structures), and (4) power‑infrastructure counts (poles, towers, substations, transformers, lines) extracted from OpenStreetMap.

Data preprocessing addresses two chronic challenges in outage modeling: missing values and severe class imbalance. Missing weather observations, especially during extreme events, are imputed using a K‑Nearest Neighbors (KNN) approach that averages the five geographically closest counties. To mitigate the dominance of low‑impact outages, the authors adapt the Synthetic Minority Over‑sampling for Regression (SMOGN) technique. High‑impact cases (top 30 % of outage magnitudes during extreme weather) are oversampled by generating synthetic samples through SMOTER‑style interpolation between a case and its nearest neighbors, while low‑impact cases are randomly undersampled.

The framework also expands the training set by selecting “analog” periods that are meteorologically similar to each HILP seed event. Seasonal consistency is enforced by limiting analogs to the same month ± 1, and similarity is quantified using Euclidean distance in a standardized eight‑dimensional weather feature space.

Four predictive models are evaluated: Random Forest (RF), Graph Neural Network (GNN), Adaptive Boosting (AdaBoost), and Long Short‑Term Memory (LSTM) neural network. RF and AdaBoost are traditional ensemble regressors that provide variable importance but lack explicit temporal modeling. GNN treats counties as nodes linked by inferred infrastructure or socio‑economic similarity, yet its performance is constrained by the coarse granularity of publicly available grid topology. LSTM, a recurrent deep learning architecture, directly ingests the time‑ordered feature vectors, thereby capturing dynamic interactions between weather evolution and outage response.

Experimental results, based on a 70/15/15 train‑validation‑test split across 20 counties, show that LSTM outperforms the other models on all standard metrics: lower Mean Absolute Error (MAE), lower Root Mean Squared Error (RMSE), and higher coefficient of determination (R²). The advantage is most pronounced for the high‑impact subset, where LSTM reduces prediction error by more than 30 % relative to the next best model.

Key contributions of the study include: (1) a multi‑modal data integration pipeline that fuses real‑time weather, historical outage, socio‑economic, and infrastructure information; (2) the novel application of SMOGN to regression‑oriented outage prediction, addressing severe data imbalance; (3) a systematic comparative analysis of four machine‑learning approaches on a unified dataset; and (4) a large‑scale empirical validation on Michigan county‑level data, demonstrating the framework’s practical relevance.

Limitations are acknowledged: the lack of high‑resolution transmission‑grid topology hampers the full potential of graph‑based models; synthetic SMOGN samples may not fully reflect physical outage mechanisms, risking over‑fitting; and model interpretability is limited, as the study does not incorporate post‑hoc explanation tools such as SHAP or LIME. Future work is proposed to incorporate detailed GIS grid data, real‑time SCADA streams, and explainable AI techniques, as well as to test the framework across diverse geographic regions and climate‑change scenarios.


Comments & Academic Discussion

Loading comments...

Leave a Comment