PowerModelsGAT-AI: Physics-Informed Graph Attention for Multi-System Power Flow with Continual Learning

Digital Object Identifier PowerModelsGAT-AI: Physics-Informed Graph Attention for Multi-System Power Flow with Continual Learning CHI DOZIE EZEAKUN NE 1 , 2 , JOSE E. T ABAREZ 1 , REEJ U POKHAREL 1 , and ANU P P AN DEY 1 1 Los Alamos National Laboratory , Los Alamos, NM, USA 2 Department of Physics, Uni versity of Central Florida, Orlando, FL 32816 USA Corresponding author: Chidozie Ezeakunne (e-mail: cezeakunne@lanl.gov). Research presented in this work was supported by the Laboratory Directed Research and Development program of Los Alamos National Laboratory under project number 20250854ECR. ABSTRACT Solving the alternating current po wer ﬂow equations in real time is essential for secure grid operation, yet classical Ne wton–Raphson solv ers can be slow under stressed conditions. Existing graph neural networks for power ﬂow are typically trained on a single system and often de grade on different systems. W e present PowerModelsGA T -AI, a physics-informed graph attention network that predicts bus voltages and generator injections. The model uses b us-type-aware masking to handle different bus types and balances multiple loss terms, including a power -mismatch penalty , using learned weights. W e ev aluate the model on 14 benchmark systems (4 to 6,470 buses) and train a uniﬁed model on 13 of these under N − 2 (two-branch outage) conditions, achieving an average normalized mean absolute error of 0 . 89% for voltage magnitudes and R 2 > 0 . 99 for voltage angles. W e also sho w continual learning: when adapting a base model to a new 1,354-bus system, standard ﬁne-tuning causes severe forgetting with error increases exceeding 1000% on base systems, while our experience replay and elastic weight consolidation strategy keeps error increases below 2% and in some cases improves base-system performance. Interpretability analysis sho ws that learned attention weights correlate with physical branch parameters (susceptance: r = 0 . 38 ; thermal limits: r = 0 . 22 ), and feature importance analysis supports that the model captures established power ﬂo w relationships. INDEX TERMS alternating current power ﬂow , continual learning, graph attention networks, physics- informed machine learning. I. INTRODUCTION T HE alternating current (A C) power ﬂow problem is a foundational computational tool for secure and economic power system operation and planning. The A C power ﬂow equations describe the nonlinear relationship between b us voltages, power injections, and network admittances, and are fundamental to standalone power ﬂow analysis, optimal power ﬂow (OPF), and security-constrained formulations [1]–[3]. The goal is to determine unkno wn b us voltages and power injections from speciﬁed inputs, which v ary by bus type (T able 1). For a system with N b uses, let V i = V m , i e j δ i be the complex voltage at bus i , where V m , i is the v oltage magnitude, δ i is the voltage angle, and j = √ − 1 . Gi ven the b us-admittance matrix Y ∈ C N × N , the net complex power injection S inj i at TABLE 1. Bus- Type Unknown T argets and Supervision Masks. Bus Type Known in y i Unknown Targets Mask [ V m , δ, P g , Q g ] PQ (Load) P g , Q g V m , δ [1, 1, 0, 0] PV (Generator) P g , V m δ, Q g [0, 1, 0, 1] Slack V m , δ P g , Q g [0, 0, 1, 1] each non-slack bus i satisﬁes S inj i = P g , i − P d , i + j ( Q g , i − Q d , i ) = V i  N X k =1 Y ik V k  ∗ , (1) where P g , i and Q g , i are the activ e and reactive power generation at bus i , P d , i and Q d , i are the activ e and reactive po wer demand, and ( · ) ∗ is the complex conjugate. Decomposing VOLUME 11, 2023 1 Ezeakunne et al. : PowerModelsGAT-AI for Multi-System Power Flow the admittance Y ik = G ik + jB ik into its conductance G ik and susceptance B ik , (1) yields the familiar pair of nonlinear equations per non-slack bus: P g , i − P d , i = V m , i N X k =1 V m , k [ G ik cos( δ i − δ k ) + B ik sin( δ i − δ k )] , (2) Q g , i − Q d , i = V m , i N X k =1 V m , k [ G ik sin( δ i − δ k ) − B ik cos( δ i − δ k )] . (3) In standard A C power ﬂo w analysis, one solves for the unknown state variables gi ven speciﬁed power injections and voltage setpoints. OPF augments these equations with an economic objectiv e and additional constraints [4], [5], but OPF is not the focus of this work. W e explicitly treat the complete per-bus state y i = [ V m , i , δ i , P g , i , Q g , i ] , (4) and design PMGA T -AI to infer the unknown components of y i for each b us type (PQ, PV , and Slack) as summarized in T able 1. Thus, a single model predicts bus v oltages ( V m , δ at PQ buses) and generator injections ( P g , Q g at slack buses) within a uniﬁed A C power ﬂow formulation. Historically , operators executed AC power ﬂow primarily for ofﬂine planning, day-ahead scheduling, and periodic security assessment, for which the computational cost of iterativ e Ne wton-type solv ers was acceptable [2], [6]. This assumption is increasingly strained. High penetrations of inv erter-based rene wable ener gy sources and distributed energy resources introduce faster ﬂuctuations, bidirectional ﬂows, and operating points closer to security boundaries [7]–[9]. As a result, AC po wer ﬂo w analysis is becoming a near real-time operational requirement, with operators requiring frequent solutions for contingency analysis and correctiv e actions [7], [9]. Under these stressed or ill-conditioned conditions, con- ver gence and robustness of Newton–Raphson solvers can degrade [10], [11]. Due to the trade-off between computational speed and solution reliability , there has been signiﬁcant research inter- est in data-dri ven, machine-learning-based alternativ es for predicting AC power ﬂow , with some extensions to OPF and state estimation [12]–[22]. Earlier ML architectures modeled the problem using engineered features and either multilayer perceptrons or conv olutional neural networks, with the downside that they used little or no explicit graph-structure information from power systems [5], [23]. This limitation often constrained those models to speciﬁc topologies, with limited generalization beyond training conditions and across systems [23], [24]. Graph neural networks (GNNs) have since emerged as an effecti ve framew ork for power grids, with buses as nodes, and physical branches (lines and transformers) represented as edges [25], [26]. Message passing in a GNN mirrors the physical dependence of each bus on its electrical neighbors, and existing works have applied GNNs to A C power ﬂo w prediction [14], [15], [27], [28], OPF [21], [29], contingency analysis [30], and power system state estimation [22]. Mean- while, physics-informed GNNs for power systems incorporate A C power ﬂow residuals or related constraints as dif feren- tiable penalties in the loss function, encouraging physically consistent predictions [22], [31]–[34]. Howe ver , four gaps remain: 1) System-speciﬁc models and limited transferability . Most existing GNNs for power ﬂo w are trained on a single, ﬁxed system with v aried operating conditions, requiring a separate specialized model for each system and contingency regime [12]–[14]. These models often fail to generalize under topology changes and are im- practical for deployment where operators must maintain multiple models across systems and conditions [23]. 2) Computational bottleneck in hybrid GNN-solver frameworks. Another line of work combines GNNs with iterativ e solvers, where the GNN provides a warm start or predictions validated by a physics-based crite- rion [15], [20], [21], [35]. Howe ver , the speed advantage depends on the robustness of the trained model; frequent solver fallback on stressed or contingency cases dimin- ishes the gain. A highly robust standalone model is there- fore a prerequisite for ef fectiv e hybrid deployment, and this work focuses on improving standalone predictive performance. 3) Catastrophic forgetting in continual learning. Real- world grids are nonstationary , and several works ad- vocate on-the-ﬂy ﬁne-tuning where difﬁcult cases are solved with a trusted AC solver and used to update the model [5], [15]. Howe ver , naiv e ﬁne-tuning can lead to catastr ophic forg etting , where performance on previ- ously learned tasks collapses [36], [37]. This challenge is recognized as open for deep graph networks [37], yet explicit forgetting mitigation remains limited in published A C power ﬂow GNN studies [15]. 4) Limited physical interpretability . Although inter- pretability has received some attention in GNN-based po wer system modeling [14], [38], systematic validation that learned representations reﬂect known electrical principles remains limited. Established explainability methods such as GNNExplainer [39] and integrated gradients [40] are av ailable for graph learning, yet without such validation it remains unclear whether predictions rely on physically meaningful features or spurious correlations. T o address these gaps, we introduce PowerModelsGA T -AI (hereafter PMGA T -AI), a physics-informed model for solving A C power ﬂow equations. A. CONTRI BUTIONS Our contributions are: 2 VOLUME 11, 2023 Ezeakunne et al. : PowerModelsGAT-AI for Multi-System Power Flow 1) Uniﬁed multi-system learning for A C power ﬂow . W e e valuate PMGA T -AI on 14 standard benchmarks (4 to 6,470 buses), including system-speciﬁc baselines across all systems and a uniﬁed model trained across 13 systems. This establishes a single framework for uniﬁed learning across heterogeneous grid topologies and operating conditions. 2) Bus-type-aware prediction formulation. W e formu- late AC power ﬂow prediction at the bus lev el as learning the unknown subset of y i = [ V m , δ, P g , Q g ] under a bus-type-aw are supervision mask (T able 1). This enables one model to jointly infer bus voltages and generator injections without leaking speciﬁed inputs into supervised losses. 3) Physics-inf ormed graph-attention training objective. PMGA T -AI combines edge-aware graph attention with a differentiable power -mismatch constraint deriv ed from (1) . W e optimize the four supervised targets and the physics term jointly using graph-normalized losses with homoscedastic uncertainty weighting, avoiding manual loss-weight tuning. 4) Continual learning with forgetting mitigation and physics-inf ormed alignment. F or adaptation to new systems or operating conditions, we integrate experience replay with elastic weight consolidation (EWC) to mitigate catastrophic forgetting. Separately , we analyze learned attention and feature attributions, showing align- ment with known electrical relationships. W e organize the rest of the paper as follo ws. Sec. II describes dataset generation and graph construction. Sec. III presents the model architecture, loss design, and optimization. Sec. IV describes the continual learning framework. Secs. V and VI report the evaluation setup and results. Sec. VII presents the interpretability analysis, and we conclude in Sec. VIII. The appendices provide feature deﬁnitions, contingency statistics, detailed per-system results, and continual learning analysis. II. DATASET GENERATION AND PREPROCESSING W e generated a dataset from 14 standard power systems 1 spanning different scales using pandapower for A C power ﬂow simulations [41]. These systems enable cross-system training and e valuation. For each system, we extract bus- state vectors y i = [ V m , δ, P g , Q g ] , per-b us input features x i , and branch-lev el edge features e ij (feature deﬁnitions in Appendix A); bus-type masks deﬁne which components are supervised. A full list of the systems considered and contingency statistics are provided in Appendix B [42]–[44]. A. S YSTEMS AND CONTI NGE NCI ES W e generate scenarios by sampling div erse operating condi- tions and topologies. T o prev ent overﬁtting to global load corre- lations, we use a hierarchical randomization scheme for acti ve 1 All system names follow the pandapower naming con ventions [41], [42]; variations in size and naming (e.g., case4gs , case14 , case_illinois200 ) reﬂect the original system identiﬁers. power loads. The load at bus i is scaled by ρ i = γ · R r ( i ) · J i , where: • γ ∼ U [0 . 7 , 1 . 3] is a global scaling factor . • R r ( i ) ∼ U [0 . 75 , 1 . 25] is a regional factor shared by buses with similar nominal voltage. • J i ∼ U [0 . 95 , 1 . 05] is a local jitter factor . Generator activ e power setpoints are scaled globally by fac- tors sampled from U [0 . 8 , 1 . 2] . Generators are modeled with sufﬁcient reactive capability to maintain scheduled voltage setpoints. T o ensure the model learns generalized po wer ﬂow physics rather than memorizing speciﬁc network impedances, we apply random jitter to the branch parameters. F or e very scenario, the electrical properties (resistance R , reactance X , and susceptance B ) of all lines and transformers are independently scaled by a factor sampled from U [0 . 9 , 1 . 1] (i.e., ± 10% variation). Finally , to ensure rob ustness against topology changes, we generate N − k contingencies by applying random branch outages. W e consider up to two branch outages ( N − 2 ) when generating contingency scenarios. B. GR APH CONSTRUCTION W e construct a directed graph G = ( V , E ) where each bus is a node and each physical branch (line, transformer , series impedance, switch) is represented by reciprocal edges ( i → j ) and ( j → i ) . This directed representation preserves direction- dependent branch parameters (Sec. II-C ). Preprocessing yields node features x i and edge features e ij . W e also add a self- loop ( i → i ) for ev ery node; its edge attributes carry diago- nal network-admittance terms, while bus shunt admittance ( G sh , B sh ) is retained as node-lev el input. C. NODE AN D E DG E FEA TUR ES Node features For each bus i , we provide a feature vector x i ∈ R 23 that combines operational and topological information, together with a one-hot bus-type indicator: • Nominal V alues and Setpoints: Nominal voltage ( V n ), voltage setpoint ( V set m ), and activ e power setpoint ( P set g ). • Known Injections: Active and reactive load ( P d , Q d ). • Physical Limits: V oltage and reactiv e po wer limits ( V min m , V max m , Q min g , Q max g ). • Physics Parameters: System base power ( S base ), fre- quency ( f ), and bus shunt admittance ( G sh , B sh ). • Topological Features: 1st- and 2nd-hop degree, weighted distance to slack, electrical betweenness, and aggregate neighbor net injection. • Neighbor Indicators: Binary ﬂags for 1st- and 2nd-hop neighbors being generators or slack buses. The aggregate neighbor net injection feature is computed from generator setpoints and load inputs only , not from solved output targets. VOLUME 11, 2023 3 Ezeakunne et al. : PowerModelsGAT-AI for Multi-System Power Flow Edge features Each directed branch edge ( i → j ) has a 7 -dimensional attribute vector obtained by concatenating series admittance ( Y ser ij ), branch type ( t ij ), and thermal limit ( I max ij ): e ij =  ℜ ( Y ser ij ) , ℑ ( Y ser ij ) | {z } admittance t ij |{z} type (one-hot) ∈ R 4 I max ij |{z} thermal limit  ∈ R 7 . Here, ℜ ( · ) and ℑ ( · ) are the real and imaginary parts. In implementation, Y ser ij is the directed of f-diagonal branch- admittance edge term (including tap effects where applicable), so ℜ ( Y ser ij ) and ℑ ( Y ser ij ) are its conductance and susceptance components. I max ij is a per-unit thermal-limit proxy computed as branch rating (MV A) normalized by system base power . For self-loops, the ﬁrst two edge features use diagonal admittance terms, while type and rating features are set to zero. The nodal- balance terms Y ik in (1) are entries of the bus-admittance matrix Y assembled from network elements (including tap and shunt effects). Complete feature deﬁnitions are provided in Appendix A. D. BUS- TYPE MA SKS AN D DA T A V ALI D A TION W e deﬁne a binary training mask, M ∈ { 0 , 1 } N × 4 , to identify the learning targets for each bus i based on its type. A value of 1 indicates an unknown target to be predicted. T able 1 deﬁnes supervision ov er the model state vector y i = [ V m , δ, P g , Q g ] . Sampling balance W e use a weighted random sampler with probability in versely proportional to the number of samples for each system to ensure balanced e xposure across all systems (e.g., case14 and case_illinois200 ) during training. Feature and target scaling Node features are standardized (zero mean, unit variance) using statistics computed from the training set, and the same transformation is applied to the four target variables ( V m , δ, P g , Q g ) . Edge features are kept in physical units where appropriate. Predictions are inv erse-transformed before ev alu- ation and physics-loss computation, ensuring absolute-error metrics are reported in physical units, while normalized metrics are unitless. III. MODEL ARCHITECTURE AND TRAINING W e design PMGA T -AI (Fig. 1) as a deep GNN composed of stacked pre-norm residual blocks, followed by a multi-target output head. A. PRE-NOR M R ESIDUAL G NN ENCODE R The model maps input node features x i to a hidden dimension d hidden using a linear projection. The resulting embeddings h (0) are processed by L pre-norm residual blocks (where normalization precedes each layer transformation). For the intermediate layers ℓ ∈ { 1 , . . . , L − 1 } , the update rule includes a non-linearity and dropout to deepen the representation: h ( norm ) = GraphNorm( h ( ℓ − 1) , b ) (5) m ( ℓ ) = GNN-La yer( h ( norm ) , E , e ) (6) h ( ℓ ) = h ( ℓ − 1) + Drop out  ELU  W ( ℓ ) head m ( ℓ )  (7) where GraphNorm( · , b ) normalizes node features within each graph (identiﬁed by batch vector b ) with a learnable shift [45], GNN-La yer is the GA Tv2 con volution deﬁned in Sec. III-B , and W ( ℓ ) head projects the concatenated attention heads back to d hidden . The ﬁnal layer ( ℓ = L ) prepares features for the readout head, aggregating attention heads by averaging rather than concatenation, and omitting the ﬁnal activ ation: h ( L ) = Pro j res ( h ( L − 1) )+GNN-La yer(GraphNorm( h ( L − 1) ) , E , e ) (8) where Pro j res aligns dimensions for the residual connection if necessary . B. GA T V2 WITH E DGE FEA TUR ES The GNN-La yer uses GA Tv2 [46] with 7-dimensional edge features e ij integrated into the edge-aware attention mech- anism. For each of H attention heads (indexed by m ), the computation is: z ( m ) ij = W ( m ) t h i + W ( m ) s h j + W ( m ) e e ij , ˜ e ( m ) ij = a ( m ) ⊤ LeakyReLU  z ( m ) ij  , α ( m ) ij = softmax j  ˜ e ( m ) ij  , m ( m ) i = X j ∈N ( i ) α ( m ) ij W ( m ) s h j . (9) Configuration W e set the hidden dimension d hidden =128 and employ L =4 GA Tv2 layers with H =4 heads. Intermediate layer heads are concatenated, while the ﬁnal layer heads are av eraged. Physics-informed self-loops are explicitly included in the graph topology during construction (Sec. II-C ) rather than added implicitly by the layer . C. MU L TI- T ARG ET OUTPUT H EAD A multi-head architecture maps the ﬁnal node embedding h ( L ) i to the four-dimensional output v ector ˆ y i . A shared multilayer perceptron (MLP) trunk ﬁrst extracts a latent representation ξ i : ξ i = ELU( W trunk h ( L ) i + b trunk ) (10) This representation is projected by four separate linear layers to produce the state variable predictions: ˆ V m , i = w ⊤ V m ξ i + b V m ˆ δ i = w ⊤ δ ξ i + b δ ˆ P g , i = w ⊤ P g ξ i + b P g ˆ Q g , i = w ⊤ Q g ξ i + b Q g (11) 4 VOLUME 11, 2023 Ezeakunne et al. : PowerModelsGAT-AI for Multi-System Power Flow Lay er Internals Node F eatures x i Operational + Topological + Bus-Type ( D = 26) Edge F eatures e ij ( Y ser , Type, I max ) Supervision Mask M PQ [1,1,0,0], PV [0,1,0,1], Slack [0,0,1,1] Input Pro j 1 26 GA Tv2 Lay er 1 GA Tv2 Lay er 2 GA Tv2 Lay er 3 GA Tv2 Lay er 4 Encoder Stack ( L = 4 ) 2 128 7 Shared T runk (Linear + ELU) 3 128 Linear Linear Linear Linear 64 V m δ P g Q g 4 1 1 1 1 Po wer Mismatch L phy = | S inj − S ﬂow | 2 5 Physics-Informed Loss GraphNorm (128) GA Tv2 (4 Heads) Linear Pro j ELU + Dropout 512 128 Residual (+) FIGURE 1. PowerModelsGA T-AI Architecture. The framework comprises (1) a bus-type-aware input stage with a supervision mask, (2) an encoder stack of 4 Pre-Norm Residual GA Tv2 blocks, (3) a shared decoding trunk, (4) multi-head outputs for V m , δ, P g , Q g , and (5) a physics-informed loss L phy . Red dashed lines indicate how predicted outputs and the supervision mask feed into the physics-informed loss L phy . This design uses a shared learned feature space while produc- ing task-speciﬁc outputs. D. L OSS FU NCTIONS AND PHYSICS CONSTRAI NT S W e formulate the training objectiv e as a multi-task learning problem. Let the state at b us i be y i = [ V m , i , δ i , P g , i , Q g , i ] . W e deﬁne ﬁv e loss components L τ for τ ∈ C = { V m , δ, P g , Q g , phy } , where phy is the physics-mismatch penalty , which are dynamically balanced (Sec. III-E). A naiv e mean-squared error (MSE) over all buses implicitly biases training tow ard more frequent bus types (typically PQ buses) and larger systems. T o reduce this bias from bus-type frequency and graph size, we compute the loss per graph using masks that include only unknown tar gets for each b us type (T able 1), then average across graphs. Let M i ,τ ∈ { 0 , 1 } be the binary mask indicating if variable τ is an unknown target for bus i . For a batch of N G graphs, the loss for state v ariable τ is computed by ﬁrst normalizing the error within each graph g , and then averaging across graphs: L τ = 1 N G N G X g =1 " P i ∈V g M i ,τ · f τ ( ˆ y i ,τ , y i ,τ ) P i ∈V g M i ,τ + ε # (12) where V g is the set of nodes in graph g and ε is a stability term. This formulation ensures that a small 14-bus system contributes equally to the gradient update as a 200-bus system, prev enting bias toward larger systems. Node-wise error functions f τ For voltage magnitudes and power injections ( τ ∈ { V m , P g , Q g } ), f τ is the standard squared error on standardized targets: f τ ( ˆ y , y ) = ( ˆ y − y ) 2 (13) For voltage angles, standard MSE is not suitable because angles are periodic. T o address this, we optimize the wrapped residual in degrees; in our experiments, this improved conv er- gence stability and ﬁnal predicti ve performance relativ e to a radian-domain loss. Let ∆ i = ˆ δ (rad) i − δ (rad) i be the angular difference in radians. The wrapped loss is f δ ( ˆ δ i , δ i ) =  180 π atan2(sin ∆ i , cos ∆ i )  2 (14) where atan2(sin ∆ , cos ∆) returns the principal value in ( − π , π ] , so the error reﬂects the shortest arc. Physics-Informed Power Mismatch W e enforce physical consistency by minimizing bus-le vel power -mismatch errors deriv ed from the nodal power balance in (1) . Speciﬁcally , we penalize the deviation between the hybrid net injection S inj , hyb i and the calculated ﬂow ˆ S ﬂow i . T o prev ent information leakage, we deﬁne S inj , hyb i by mixing model predictions with reference data according to the bus mask: S inj , hyb i = ( P hyb g , i − P d , i ) + j ( Q hyb g , i − Q d , i ) (15) where the hybrid generation terms combine the predicted ( ˆ P g , ˆ Q g ) and known ( P g , Q g ) values: P hyb g , i = M i , P g ˆ P g , i + (1 − M i , P g ) P g , i , (16) Q hyb g , i = M i , Q g ˆ Q g , i + (1 − M i , Q g ) Q g , i . (17) Here, P d , i and Q d , i are ZIP-equiv alent loads ev aluated at ˆ V m , i . VOLUME 11, 2023 5 Ezeakunne et al. : PowerModelsGAT-AI for Multi-System Power Flow The calculated ﬂow is the nodal power balance (1) e valuated at predicted voltages, where ˆ V i = ˆ V m , i e j ˆ δ i is the complex voltage constructed from the predicted magnitude and angle: ˆ S ﬂow i = ˆ V i X k ∈N ( i ) Y ∗ ik ˆ V ∗ k (18) The physics loss is the mean squared modulus of the mismatch, computed in 64 . 00 bit precision for numerical stability and averaged per graph as in (12): L phy = 1 N G N G X g =1   1 |V g | X i ∈V g   S inj , hyb i − ˆ S ﬂow i   2   (19) E. MU L TI- T ASK OPTIM IZA TION The training objecti ve in volves minimizing ﬁve distinct loss components with different scales and optimization difﬁculty . Naiv e summation with ﬁxed weights requires extensiv e hy- perparameter tuning and can cause one loss to dominate the gradient updates ov er the others. T o address this, we treat the problem as a multi-task learning challenge and employ homoscedastic uncertainty weighting [47]. This method interprets the relativ e weight of each task as a learnable uncertainty parameter σ τ , deriv ed from maximizing the Gaussian likelihood of the multi-task objectiv e. Let C = { V m , δ, P g , Q g , phy } be the set of task indices deﬁned in Sec. III-D , comprising the four supervised state variables and the physics mismatch. The uniﬁed training objectiv e L total ( θ , σ ) is: L total ( θ , σ ) = X τ ∈C 1 2 σ 2 τ L τ ( θ ) + X τ ∈C log σ τ (20) where θ represents the model parameters and σ = { σ τ } τ ∈C are learnable noise scalars optimized simultaneously with the model parameters. The term log σ τ acts as a regularizer to prevent the trivial solution σ τ → ∞ . This approach dynamically balances the supervised losses and the physics- informed constraint without manual tuning, with learned weights that adapt to reﬂect the relative difﬁculty of each prediction task. In implementation, uncertainty weighting is acti vated after an initial static-loss warmup period (50 epochs). F . OPTIM IZA TION AN D EARL Y ST OPPI NG W e minimize (20) using the AdamW optimizer with an initial learning rate η 0 = 10 − 3 and weight decay 10 − 5 . The learning rate follo ws a schedule comprising a linear warmup for the ﬁrst E w epochs, followed by cosine annealing decay [48] to η min = 10 − 5 for the remainder of the training budget E . W e trained on an NVIDIA R TX 4000 Ada GPU ( 20 . 00 GB VRAM). W e used a baseline batch size of 128, reducing this v alue for large-scale systems and during multi-system learning to accommodate memory constraints. The implementation is based on PyT orch and PyT orch Geometric, with NumPy , SciPy , NetworkX, Matplotlib, and scikit-learn for data processing, graph utilities, visualization, and ev aluation [49]–[55]. W e use a two-phase early stopping strategy . After the main validation metric plateaus, we keep a checkpoint only if it reduces the physics mismatch L phy and does not reduce overall performance by more than 5 . 00 % . This giv es a ﬁnal model that stays physically consistent while maintaining low supervised loss. IV. CONTINUAL LEARNING FOR ON-THE-FLY ADAPTATION Operational po wer grids are dynamic; topological changes (switching, outages) or extreme loading conditions can effec- ti vely alter the data distribution. T o adapt to new scenarios and new systems without full retraining, we combine experience replay and EWC within a continual-learning frame work to mitigate catastrophic forgetting. Experience Replay T o enable the model to learn from both old and ne w scenarios or systems, we employ experience replay . W e maintain a replay buf fer D rep ⊂ D old of scenarios from the systems on which the model was originally trained, sampled uniformly at random. During adaptation, the new data D new may consist of stressed scenarios from existing systems or scenarios from a new system. Because shifts are larger when adapting across structurally dif ferent systems, we mix D new with replay data D rep to reduce forgetting while learning the new target systems. W e construct a mixed training set D mix = D new ∪D rep , ensuring that the model retains performance on base systems while adapting to the new systems or operating conditions. Elastic Weight Consolidation (EWC) While experience replay helps mitigate data distribution shifts, it does not prev ent important model parameters from changing. T o mitigate this, we augment the loss with EWC [36], which constrains parameters that are critical for performance on base systems. Let θ ∗ be the parameters learned during initial training. W e compute the diagonal Fisher Information Matrix F , where each element F j approximates the importance of parameter θ j for maintaining performance on base systems. The re gularization penalty is: L EWC ( θ ) = λ ewc X j F j ( θ j − θ ∗ j ) 2 (21) This regularization constrains parameters with high F j (critical for power ﬂow solutions on base systems), while allowing low- sensitivity parameters to adapt to new systems or operating conditions. The ﬁnal on-the-ﬂy objective uniﬁes the learning signal from the mixed data batch B ∼ D mix with the EWC penalty: L online = L total ( θ , σ , B ) + L EWC ( θ ) (22) where L total is the multi-task objectiv e from (20). V. EVALUATION SETUP 6 VOLUME 11, 2023 Ezeakunne et al. : PowerModelsGAT-AI for Multi-System Power Flow A. DA T ASET AN D EV ALUA TION REGI MES PMGA T -AI is ev aluated on 14 benchmark systems ranging from small-scale systems (e.g., case4gs , case14 ) to large- scale systems (e.g., case1354pegase , case6470rte ). W e deﬁne two distinct ev aluation regimes: 1) Regime 1: System-Specific Baselines W e establish baselines by training independent PMGA T -AI models for each of the 14 systems, using 10,000 scenarios per system. W e ev aluate two operating regimes: (1) Normal operation ( N − 0 ), where models are trained and tested on randomized load and generation proﬁles with ﬁxed topology; and (2) Contingency ( N − 2 ), where models are retrained on datasets with randomized load and generation proﬁles and up to two random branch outages per scenario to impro ve robustness (Appendix B, T able 7). 2) Regime 2: Multi-System and Continual Learning This regime e valuates multi-system learning under N − 2 con- tingency conditions. Because batching multiple large graphs is memory-intensiv e, case6470rte is excluded from uniﬁed training. W e sample 2,000 scenarios per system, resulting in 26,000 scenarios for the uniﬁed model trained on 13 systems and 24,000 for the base model trained on 12 systems. W e ﬁrst train the uniﬁed model on all 13 systems (T able 11). This model is also used for interpretability analysis in Sec. VII. Then, we e valuate continual learning by pre-training a base model on 12 systems (e xcluding case1354pegase ) and adapting it to case1354pegase using the EWC+Replay strategy in Sec. IV. Detailed contingency statistics are provided in Appendix B, T able 8. For v ery small systems (notably case4gs and case9 ), feasible N − 2 samples are absent due to islanding or failure of A C power ﬂo w to con ver ge; these systems contribute only N − 0 and N − 1 samples. B. EV ALUA TION PRO TOCOL W e randomly split all the datasets into training (80%), valida- tion (10%), and test (10%) sets, stratiﬁed by system, then ev aluate all models on the held-out test sets. Metrics are computed only for unknown state variables as deﬁned by each bus type (T able 1). This prevents known ﬁxed setpoints from artiﬁcially inﬂating performance metrics. W e use the following metrics to ev aluate different aspects of model performance: • Absolute and Squared Error: W e report root mean squared error (RMSE) and mean absolute error (MAE). For voltage angles, errors are wrapped to [ − 180 ◦ , 180 ◦ ] before computation. • Scale-Normalized Error: W e report normalized mean absolute error (NMAE), deﬁned as MAE divided by the range of target values for each target. • Coefﬁcient of Determination ( R 2 ): W e report R 2 . For voltage angles, R 2 is computed with wrapped residuals and a circular-mean reference so that it is consistent with the angle MAE and RMSE deﬁnitions. • Knowledge Loss: W e quantify forgetting with Knowl- edge Loss ( K ). For state v ariable τ ∈ { V m , δ, P g , Q g } , error metric µ (e.g., NMAE, MAE), and previously learned system s : K µ τ ( s ) = Err post τ ,µ ( s ) − Err pre τ ,µ ( s ) (23) where Err pre and Err post are errors before and after ﬁne- tuning on the new task. Positive K indicates forgetting; negati ve K indicates impro ved post-adaptation perfor - mance. Across base systems S , we report: ¯ K µ τ = 1 |S | X s ∈S K µ τ ( s ) (24) VI. RESULTS Results are organized by the two regimes deﬁned in Sec. V. A. REG IM E 1: BASELI NE S YSTEM PE RF ORMANC E (N − 0 ) The detailed per-system metrics in Appendix C, T able 9 establish baseline performance under normal N − 0 conditions. For voltage magnitudes ( V m ), MAE on standard IEEE benchmark systems (e.g., case14 , case30 ) is on the order of 10 − 4 p.u. , corresponding to an NMAE of approximately 0 . 2% – 0 . 3% . For the larger case6470rte system, NMAE is approximately 0 . 98% , and R 2 remains above 0 . 94 , showing that the model captures system-lev el voltage variation. W e observe that voltage angle ( δ ) error increases with system size (MAE ≈ 0 . 05 ◦ for case14 versus ≈ 5 . 0 ◦ for case6470rte ). A likely contributor is reference-angle drift in larger networks [56]. Since nodal power balance depends on angle dif ferences between connected b uses ( (2) , (3) ), reference-angle drift can increase absolute angle MAE without degrading the inter-b us dif ferences that go vern power ﬂow . Even so, angle R 2 remains high ( ≈ 0 . 99 for case6470rte ). Activ e ( P g ) and reactiv e ( Q g ) predictions at slack and PV buses also show low error, with P g R 2 consistently abov e 0.99 across systems. Overall, these metrics show that the model pre- serves physically consistent power ﬂow relationships across the ev aluated systems and scenarios. B. REG IM E 1: ROB USTNESS TO CONTI NGE NCIES (N − 2 ) T o assess robustness to topology changes within a system, Regime 1 also ev aluates each system under N − 2 contingencies, where random branch outages redistribute branch ﬂows and bus-v oltage proﬁles, creating man y topology combinations and a harder prediction task. Complete per-system results are provided in Appendix C, T able 10. Compared with the N − 0 baseline, we observe marginal error increases (e.g., in case14 , v oltage-magnitude MAE increases from 3 . 3 × 10 − 4 to 7 . 2 × 10 − 4 p.u. ), but errors remain small. V oltage NMAE stays below 0 . 5% for most systems, and angle R 2 stays near 0 . 99 . Even in the challenging case6470rte system under N − 2 contingency , angle R 2 is ≈ 0 . 99 , close to its N − 0 value. This is consistent with the GA Tv2 dynamic attention mechanism reweighting branch importance as connectivity , loading, and branch attributes change. VOLUME 11, 2023 7 Ezeakunne et al. : PowerModelsGAT-AI for Multi-System Power Flow (a) Training Performance (b) T est Performance FIGURE 2. PowerModelsGA T-AI Performance across 13 Power Systems. Parity plots compare model predictions (y-axis) against the Newton–Raphson reference solution (x-axis) for all unknown variables in the unified 13-syst em training and test sets; color intensity represents point density. C. REG IM E 2: MU L TI- S YSTEM AND CONTI NUAL LEARN ING W e ev aluate PMGA T -AI across multiple trained systems and for adaptation to new systems. W e ﬁrst assess a uniﬁed model trained on 13 systems, then e valuate continual adap- tation by ﬁne-tuning a base model trained on 12 systems to case1354pegase . 1) Unified 13-System Performance W e analyze the uniﬁed model trained on 13 systems. Detailed results are provided in Appendix D, T ables 11 and 12. Compared with system-speciﬁc models, the uniﬁed model can be used across multiple systems, with a small increase in the per-system error . For e xample, on case14 , v oltage NMAE increases from about 0.15% for the system-speciﬁc model (Regime 1) to about 0.76% for the uniﬁed model. Across the 13 trained systems, the average voltage NMAE is 0 . 89% , and active-po wer R 2 exceeds 0 . 97 (Fig. 2), with Fig. 2a and Fig. 2b showing similar prediction versus reference trends across all predicted variables. Per-system plots are provided in the Supplementary Material (Figs. S1–S13). At the bus level, prediction error is typically lowest in small- scale systems such as case4gs and case30 , where predic- tions closely match the reference values. In lar ger systems such as case1354pegase , target variables (especially δ , P g , and Q g ) span wider ranges across b uses and scenarios. As a result, absolute errors can be larger ev en when NMAE remains relativ ely low . Branch outages further widen these ranges and increase operating stress, which increases angle and power -injection deviations. Additional higher-error examples are provided in the Supplementary Material (Fig. S14). TABLE 2. T arget System ( case1354pegase ) Performance After Fine-T uning. Base (Unadapted) Naive EWC+Replay Target NMAE% R 2 NMAE% R 2 NMAE% R 2 V m 12.98 − 2 . 36 1.44 0 . 95 1.90 0 . 91 δ 11.96 − 0 . 01 0.61 0 . 99 0.68 0 . 99 P g 20.79 0 . 01 0.42 0 . 99 0.45 0 . 99 Q g 1.61 0 . 02 0.15 0 . 99 0.17 0 . 99 T arget-system performance is shown here; base-system forgetting is summarized in T able 3. 2) Continual Learning: Adaptation Without Forgetting While uniﬁed training works well, in practice new systems may need to be incorporated without full retraining. Prior work highlights persistent generalization and scalability challenges under changes in system size and topology [23], [24]. T o quantify this, we ev aluated a base model trained on 12 systems against the held-out case1354pegase system. As sho wn in T able 2, the unadapted base model performs poorly on the target system, with voltage NMAE of 12 . 98% , angle NMAE of 11 . 96% , and negati ve R 2 values meaning predictions worse than the mean. These gaps make clear that cross-system generalization is not automatic and requires explicit adaptation strategies. W e ev aluated all models on identical test sets and used λ ewc = 0 . 5 with a replay ratio of 0.3 to balance adaptation and forgetting. W e compared this setting against a naive baseline ( λ ewc = 0 , no replay). On case1354pegase , both strategies reach R 2 > 0 . 99 for voltage angles and power injections (T able 2). Nai ve ﬁne-tuning giv es slightly better target V m NMAE ( 1 . 44% 8 VOLUME 11, 2023 Ezeakunne et al. : PowerModelsGAT-AI for Multi-System Power Flow TABLE 3. Average Knowledge Loss ( — K NMAE τ ) on Base Systems. State V ariable Naive EWC+Replay V oltage Magnitude ( V m ) 12.00 0.26 V oltage Angle ( δ ) 138.23 0.12 Activ e Power ( P g ) 1891.82 1.25 Reactiv e Power ( Q g ) 103.79 1.54 V alues are in percentage points of NMAE change ( ¯ K NMAE τ ), averaged across 12 base systems. Lower is better; ¯ K ≈ 0 indicates minimal forgetting on base systems. versus 1 . 90% ), but it causes severe forgetting on base systems ( ¯ K NMAE P g ≈ 1892% , ¯ K NMAE δ ≈ 138% ; T able 3). Naiv e ﬁne- tuning is not suitable for long-term cross-system adaptation. Our EWC+Replay strategy addresses this problem. Al- though tar get performance is slightly lower than nai ve ﬁne- tuning, it maintains low knowledge loss across all state vari- ables ( ¯ K NMAE V m = 0 . 26% , ¯ K NMAE δ = 0 . 12% , ¯ K NMAE P g = 1 . 25% , ¯ K NMAE Q g = 1 . 54% ). The same trend appears with MAE ( ¯ K MAE δ = 0 . 16 ◦ versus 74 ◦ for naiv e; see Appendix E). T aken together, these results conﬁrm that the model can adapt to new systems while minimizing forgetting on base systems. V isual summaries are provided in Supplementary Material (Figs. S15 and S16). VII. INTERPRETABILITY ANALYSIS T o assess PMGA T -AI’ s ability to capture established physical relationships from learned representations rather than relying on spurious statistical correlations, we analyze the model from two perspectives: branch importance , quantiﬁed using learned attention weights, and bus feature sensitivity , assessed using integrated gradients. Attention weights are useful for interpretation, but they do not by themselves establish causal inﬂuence [57], [58]. Howe ver , consistent correlations with known electrical parameters provide supporting evidence of alignment with electrical relationships. A. IDE NTIFICA TION OF C RITICAL T OP OL OG ICAL STRUCTUR ES T o quantify topological inﬂuence, we use the learned attention coef ﬁcients from the ﬁnal GA Tv2 layer . For each branch ( i , j ) , we deﬁne a branch importance scor e as the mean attention weight ¯ α ij = 1 |D | X c ∈D 1 H H X m =1 α ( m , c ) ij ! , (25) averaged across all H attention heads and all test scenarios c ∈ D . W e use these scores for topology visualization (Fig. 3). For correlation analysis (Fig. 4a), we compute feature–attention correlations from branch-lev el attention within each system and report the equal-weight average across systems. Fig. 3 illustrates the learned importance scores for two rep- resentati ve systems, case14 (Fig. 3a) and case57 (Fig. 3b). The model does not attend uniformly to all branches. Instead, it assigns higher importance to a subset of ke y branches. In case14 , the model assigns higher importance to key branches (dark red). Transformer branches show mixed importance: sev eral recei ve lo w scores, whereas others are among the most inﬂuential connections. The model responds to system- speciﬁc coupling rather than simply down-weighting trans- formers. B. PHYSICAL DR IVERS OF BR ANCH IMP ORT ANC E T o interpret why some branches receiv e higher importance, we compute Pearson correlations ( r ) between branch-importance scores and raw electrical branch parameters (Fig. 4a). Corre- lations are computed within each system and then averaged with equal system weight. Three patterns hold across systems: 1) Susceptance ( ℑ ( Y ser ) ): Susceptance has the strongest positiv e correlation ( r ≈ 0 . 38 ). This reﬂects A C power ﬂow relationships, where real-power ﬂo w is strongly tied to branch reactance and susceptance and shared among parallel branches according to their impedances [6], [59]. 2) Thermal Limit ( I max ): Thermal limit is also positively correlated ( r ≈ 0 . 22 ). Although thermal rating is not a direct admittance term, higher-rated branches are often electrically inﬂuential, so this association is physically consistent. The effect is system-dependent, and in some systems with near-constant branch ratings the thermal- limit correlation within the system is weak as e xpected. 3) Conductance ( ℜ ( Y ser ) ): Conductance shows a negati ve correlation ( r ≈ − 0 . 27 ). In this dataset, branches with larger effecti ve resisti ve components tend to receiv e lower branch-importance scores. T o separate the individual contributions of each branch property , we also ﬁt a multiv ariate re gression with conduc- tance, susceptance, thermal limit, and branch type as joint predictors of branch-importance score. Thermal limit has the largest standardized coefﬁcient ( β std ≈ 0 . 36 ), followed by susceptance ( β std ≈ 0 . 15 ), while conductance has a near-zero independent effect. This pattern is clearer in lines, where resistance and reactance are less correlated than in transformers. C. BR ANCH IM P ORT ANC E DISTRIB UTION B Y SYS TE M Fig. 4b compares branch-importance distrib utions across systems. Larger systems show stronger concentration near low branch-importance scores. For case1354pegase , the median importance is 0.147, and 46.24% of branches hav e scores at or below 0.1. In case14 and case30 , the medians are higher (0.297 and 0.298), and the fractions of branches with scores at or below 0.1 are smaller (11.50% and 15.34%). For high branch-importance scores, case1354pegase still has a non-negligible subset: 15.87% of branches are at or abov e 0.9, compared with 2.55% in case14 and 3.29% in case30 . Intermediate systems such as case118 and case300 also show moderate lo w-score concentration, with 26.56% and 24.62% of branches, respectiv ely , at or below 0.1. Overall, this points to sparser branch weighting in larger networks. VOLUME 11, 2023 9 Ezeakunne et al. : PowerModelsGAT-AI for Multi-System Power Flow (a) case14 (b) case57 FIGURE 3. Learned Branch Impor tance Maps. Darker edges indicate higher importance. (a) case14 : the model assigns higher importance to key branches. (b) case57 : impor tance scores vary across the system. (a) Feature Correlation (b) Distribution by System FIGURE 4. Analysis of Branch Importance Scores. (a) Correlation between branch impor tance and physical branch parameters. (b) Distribution of branch importance scores across benchmark systems. case5 is an outlier among the smaller systems, with a split distribution across branches (30.34% at or belo w 0.1 and 22.76% at or above 0.9). In this system, two of six branches have series susceptance roughly three to ﬁve times the median, while the remaining four are tightly clustered near it. W ith only six branches, each one represents about 17% of the distribution, so this impedance contrast produces a pronounced split. In comparison, case4gs has similarly few branches but nearly uniform susceptance, and case14 has comparable impedance spread but 20 branches, so neither shows the same bimodality . D. BUS FEA TUR E SE NSITIVITY AN D PHYSICS ALIGN MENT Having analyzed edge-lev el importance, we now examine node-lev el feature sensitivity . W e use integrated gradients (IG) [40] to quantify ho w input bus features inﬂuence each predicted state variable. Using the dataset mean as the baseline, we deﬁne the feature-importance score for feature ϕ as I ϕ = 1 |S | X s ∈S " 1 N s N s X n =1    IG ϕ  x ( s ) n     # , (26) where N s is the total number of node samples from test scenarios in system s . W e then normalize these scores as ¯ I ϕ = I ϕ / P ϕ ′ I ϕ ′ , so the relati ve feature importances sum to unity . Fig. 5 shows these normalized sensitivities across all 13 systems. At the median le vel, the top contributors are V set m for V m ( ≈ 0 . 15 ), P set g for δ ( ≈ 0 . 17 ), P set g for P g ( ≈ 0 . 22 ), and Bus T ype for Q g ( ≈ 0 . 10 ). These results show three patterns expected from power system physics: 10 VOLUME 11, 2023 Ezeakunne et al. : PowerModelsGAT-AI for Multi-System Power Flow (a) V oltage Magnitude ( V m ) Sensitivity (b) V oltage Angle ( δ ) Sensitivity (c) Active Power ( P g ) Sensitivity (d) Reactive Power ( Q g ) Sensitivity FIGURE 5. Global Stability of Bus Feature Sensitivity . Boxplots show the distribution of normalized bus feature sensitivity scores across 13 systems for (a) V m , (b) δ , (c) P g , and (d) Q g . 1) P - δ Coupling For predicting V oltage Angle ( δ ) and Activ e Power Gener- ation ( P g ) (Figs. 5b and 5c), the primary drivers are Active Generation Setpoint ( P set g ) and Activ e Load ( P d ). For δ , their median normalized sensitivities are ≈ 0 . 17 and ≈ 0 . 14 . For P g , they are ≈ 0 . 22 and ≈ 0 . 16 . This ranking reﬂects the P - δ coupling in AC power ﬂow , where real-power balance strongly inﬂuences voltage angles [4]. 2) Q - V Coupling For V oltage Magnitude ( V m ) prediction (Fig. 5a), the V oltage Setpoint ( V set m ) is the dominant driver (median sensitivity ≈ 0 . 15 ), followed by Bus T ype and Reacti ve Load ( Q d ). The observed order is expected from the Q - V relationship: voltage magnitudes at PV buses are directly controlled by their setpoints, while at PQ buses they respond to local reactive injections [60]. The prominence of Bus T ype suggests that the model distinguishes between bus categories when inferring voltage proﬁles. For Reactive Power Generation ( Q g ) prediction (Fig. 5d), Bus T ype ( ≈ 0 . 10 ) and V oltage Setpoint ( V set m , ≈ 0 . 08 ) are the strongest median contributors, follo wed by Active Generation Setpoint ( P set g , ≈ 0 . 07 ). Since Q g is predicted only at PV and Slack b uses (T able 1), this result matches b us-type-speciﬁc reactiv e-power constraints. 3) Cross-System Feature Sensitivity The stability of these rankings across systems ranging from 14 to 1354 buses supports cross-system consistency . In particular , Bus T ype remains stable across targets, with compact IQR values ( ≈ 0 . 05 for V m , ≈ 0 . 04 for δ , and ≈ 0 . 04 for Q g ), reﬂecting consistent use of PQ, PV , and Slack bus-type structure across systems. T ogether , the branch-importance maps and feature attribu- tions provide a practical diagnostic vie w of which branches and bus features drive model outputs under different operating conditions. VOLUME 11, 2023 11 Ezeakunne et al. : PowerModelsGAT-AI for Multi-System Power Flow VIII. CONCLUSION This work presented PowerModelsGA T -AI, a uniﬁed frame- work that solves A C power ﬂow across the 13 systems in the trained set using physics-informed graph attention. The key contributions include: (i) system-speciﬁc baselines across 14 power systems (4 to 6,470 buses) and a uniﬁed multi-system model trained on 13 of these; (ii) a mask-aware formulation that predicts bus voltages and generator injections for all bus types; (iii) a physics-informed loss incorporating power mismatch constraints; and (iv) a continual-learning strategy (EWC+Replay) that mitigates catastrophic forgetting during adaptation to new systems. Experimental results show that PMGA T -AI achiev es strong predictiv e performance, with an average voltage magnitude NMAE of 0 . 89% across the uniﬁed 13-system benchmark set under the N − 2 ev aluation regime. PMGA T -AI maintains robust angle prediction, with R 2 > 0 . 99 on the largest transmission systems tested. Analysis of learned attention weights reveals physically meaningful patterns, including stronger concentration of branch importance in larger systems (Sec. VII). Potential directions for future research include extending the contingency dataset and training to higher-order outages (up to N − 4 ), incorporating system security metrics such as the ov erall performance index (OPI) for contingency ranking and analysis, and adapting the frame work to solve the OPF problem. Another direction is to scale uniﬁed training to larger systems using gradient checkpointing or distributed training. 12 VOLUME 11, 2023 Ezeakunne et al. : PowerModelsGAT-AI for Multi-System Power Flow APPENDIX A FEATURE DEFINITIONS This appendix details the node and edge feature sets used in PMGA T -AI. Complete node and edge feature deﬁnitions are provided in T able 4 and T able 5, respectively . VOLUME 11, 2023 13 Ezeakunne et al. : PowerModelsGAT-AI for Multi-System Power Flow TABLE 4. Node Feature Definitions ( x i ∈ R 23 + one-hot bus type). Complete specification of bus-level input features used by PowerModelsGA T-AI. See Sec. I I-C for details. Feature Symbol Description Unit Electrical Pr operties Nominal V oltage V n Rated voltage level of the bus kV Activ e Load P d Activ e power demand at the bus p.u. Reactiv e Load Q d Reactiv e power demand at the bus p.u. Activ e Generation Setpoint P set g Scheduled active power generation p.u. V oltage Setpoint V set m V oltage magnitude setpoint (PV and Slack buses) p.u. Shunt Conductance G sh Shunt conductance at the bus p.u. Shunt Susceptance B sh Shunt susceptance at the bus p.u. Physical Limits Min Reactive Limit Q min g Minimum reactive power generation capacity p.u. Max Reactive Limit Q max g Maximum reactive power generation capacity p.u. Min V oltage Limit V min m Minimum allowable voltage magnitude p.u. Max V oltage Limit V max m Maximum allowable voltage magnitude p.u. System P arameter s Base Power S base System base power for per-unit con version MV A Frequency f System nominal frequency Hz T opological F eatur es Node Degree (1-hop) — Number of directly connected buses count Node Degree (2-hop) — Number of buses within two hops count Electrical Distance to Slack — Shortest impedance-weighted path length to a slack bus a.u. Electrical Betweenness — Betweenness centrality (impedance-weighted) normalized Aggregate Neighbor Net Injection — Sum of neighboring net injections p.u. T otal Adjacent Admittance — Sum of inv erse impedance-based edge weights over incident branches p.u. Neighbor Indicators (Binary) Generator Neighbor (1-hop) — Any directly connected bus has a generator {0,1} Generator Neighbor (2-hop) — Any 2-hop neighbor has a generator {0,1} Slack Neighbor (1-hop) — Any directly connected bus is a slack bus {0,1} Slack Neighbor (2-hop) — Any 2-hop neighbor is a slack bus {0,1} Cate gorical Bus T ype — Power ﬂow bus classiﬁcation (one-hot encoded) PQ, PV , Slack TABLE 5. Edge Feature Definitions ( e ij ∈ R 7 ). Branch-level attributes for each directed branch edge, encoding series admittance (implemented as directed off-diagonal branch-admittance edge terms), one-hot branch type, and thermal limit. Feature Symbol Description Unit Series Admittance (Edge T erms) Conductance G Real part of the series-admittance edge term p.u. Susceptance B Imaginary part of the series-admittance edge term p.u. Branch T ype (One-Hot) Line — T ransmission line {0,1} T ransformer — Power transformer {0,1} Impedance — Series impedance element {0,1} Switch — Switching device {0,1} Thermal Limit Thermal Limit I max Per-unit thermal-limit proxy computed as branch rating (MV A) normalized by system base power p.u. 14 VOLUME 11, 2023 Ezeakunne et al. : PowerModelsGAT-AI for Multi-System Power Flow APPENDIX B DATASET CONTINGENCY STATISTICS This appendix reports contingency statistics of the dataset. T able 6 lists the static bus-type composition per system, and T ables 7 and 8 summarize the training and test contingency distributions. The N − k label giv es the number of removed branches per scenario: N − 0 (normal), N − 1 (single-branch outage), and N − 2 (double-branch outage). For case4gs and case9 , N − 2 samples are absent (see Sec. V). VOLUME 11, 2023 15 Ezeakunne et al. : PowerModelsGAT-AI for Multi-System Power Flow TABLE 6. Bus Type Distribution by System. Breakdown of bus counts per system, classified by bus type (PQ, PV , and Slack). System Total PQ (Load) PV (Gen) Slack case4gs 4 2 1 1 case5 5 1 3 1 case6ww 6 3 2 1 case9 9 6 2 1 case14 14 9 4 1 case30 30 24 5 1 case_ieee30 30 24 5 1 case39 39 29 9 1 case57 57 50 6 1 case118 118 64 53 1 case_illinois200 200 162 37 1 case300 300 231 68 1 case1354pegase 1354 1094 259 1 case6470rte 6470 6017 452 1 TABLE 7. Contingency Distribution for System-Specific Models ( N − 2 T raining and T est Sets). N-0 (%) N-1 (%) N-2 (%) System Train Test Train Test Train Test case4gs 49.5 50.1 50.5 49.9 — — case5 36.1 33.7 35.7 34.4 28.2 31.9 case6ww 35.0 32.1 33.1 33.2 31.9 34.7 case9 60.5 60.0 39.5 40.0 — — case14 36.3 35.4 33.7 34.4 30.0 30.2 case30 36.3 38.5 34.1 33.2 29.6 28.3 case_ieee30 37.6 37.4 32.9 34.0 29.5 28.6 case39 44.8 42.7 33.3 32.5 21.9 24.8 case57 36.3 35.3 33.0 33.9 30.7 30.8 case118 34.2 34.9 34.1 32.7 31.7 32.4 case_illinois200 46.2 45.5 31.7 31.6 22.1 22.9 case300 43.6 39.9 32.0 36.0 24.4 24.1 case1354pegase 44.0 45.7 32.3 30.6 23.7 23.7 case6470rte 44.6 43.3 33.0 31.0 22.4 25.7 TABLE 8. Contingency Distribution for Unified Models ( N − 2 T raining and T est Sets). Each system contributes 2,000 scenarios. Comparisons show the stability of the contingency distribution across training and test sets. N-0 (%) N-1 (%) N-2 (%) Model Train Test Train Test Train Test Uniﬁed (13 Systems) 40.3 41.8 36.2 35.1 23.5 23.1 Base (12 Systems) 39.5 41.5 37.3 35.6 23.2 22.8 16 VOLUME 11, 2023 Ezeakunne et al. : PowerModelsGAT-AI for Multi-System Power Flow APPENDIX C SYSTEM-SPECIFIC BASELINE RESULTS This appendix provides detailed baseline performance metrics for each system under normal ( N − 0 ) and contingency ( N − 2 ) conditions. VOLUME 11, 2023 17 Ezeakunne et al. : PowerModelsGAT-AI for Multi-System Power Flow TABLE 9. System-Specific Baseline Performance ( N − 0). Per-system metrics for system-specific models described in Sec. VI-A. Each model is trained and evaluated on its respective system under normal operating conditions (no element outages). System Target MSE RMSE MAE R 2 NMAE% case4gs V m [p.u.] 1 . 13 × 10 − 7 0 . 0003 0 . 0003 0 . 9996 0 . 2652 δ [ ◦ ] 4 . 12 × 10 − 4 0 . 0203 0 . 0148 0 . 9999 0 . 1274 P g [p.u.] 6 . 43 × 10 − 4 0 . 0254 0 . 0185 0 . 9996 0 . 2794 Q g [p.u.] 5 . 65 × 10 − 4 0 . 0238 0 . 0183 0 . 9995 0 . 3571 case5 V m [p.u.] 1 . 29 × 10 − 7 0 . 0004 0 . 0003 0 . 9996 0 . 3619 δ [ ◦ ] 1 . 16 × 10 − 3 0 . 0341 0 . 0263 0 . 9998 0 . 2556 P g [p.u.] 2 . 35 × 10 − 3 0 . 0484 0 . 0384 0 . 9994 0 . 3747 Q g [p.u.] 1 . 55 × 10 − 2 0 . 1245 0 . 0915 0 . 9993 0 . 2829 case6ww V m [p.u.] 2 . 94 × 10 − 7 0 . 0005 0 . 0004 0 . 9995 0 . 2722 δ [ ◦ ] 9 . 69 × 10 − 4 0 . 0311 0 . 0238 0 . 9999 0 . 1208 P g [p.u.] 3 . 15 × 10 − 5 0 . 0056 0 . 0042 0 . 9999 0 . 1475 Q g [p.u.] 1 . 15 × 10 − 4 0 . 0107 0 . 0082 0 . 9995 0 . 2266 case9 V m [p.u.] 1 . 40 × 10 − 6 0 . 0012 0 . 0009 0 . 9983 0 . 5361 δ [ ◦ ] 2 . 58 × 10 − 3 0 . 0508 0 . 0390 0 . 9999 0 . 0887 P g [p.u.] 1 . 26 × 10 − 4 0 . 0112 0 . 0081 0 . 9998 0 . 2153 Q g [p.u.] 2 . 51 × 10 − 4 0 . 0159 0 . 0128 0 . 9943 0 . 9679 case14 V m [p.u.] 1 . 75 × 10 − 7 0 . 0004 0 . 0003 0 . 9996 0 . 2398 δ [ ◦ ] 2 . 91 × 10 − 3 0 . 0539 0 . 0414 0 . 9999 0 . 1623 P g [p.u.] 5 . 69 × 10 − 5 0 . 0075 0 . 0056 0 . 9998 0 . 1823 Q g [p.u.] 1 . 91 × 10 − 4 0 . 0138 0 . 0097 0 . 9993 0 . 2751 case30 V m [p.u.] 2 . 74 × 10 − 7 0 . 0005 0 . 0004 0 . 9992 0 . 3105 δ [ ◦ ] 3 . 28 × 10 − 3 0 . 0572 0 . 0419 0 . 9997 0 . 1779 P g [p.u.] 4 . 12 × 10 − 5 0 . 0064 0 . 0048 0 . 9998 0 . 2230 Q g [p.u.] 2 . 02 × 10 − 4 0 . 0142 0 . 0091 0 . 9995 0 . 2045 case_ieee30 V m [p.u.] 4 . 82 × 10 − 7 0 . 0007 0 . 0005 0 . 9990 0 . 3162 δ [ ◦ ] 4 . 55 × 10 − 2 0 . 2134 0 . 1668 0 . 9979 0 . 5688 P g [p.u.] 1 . 86 × 10 − 4 0 . 0136 0 . 0108 0 . 9996 0 . 3300 Q g [p.u.] 4 . 97 × 10 − 4 0 . 0223 0 . 0153 0 . 9984 0 . 4112 case39 V m [p.u.] 2 . 34 × 10 − 6 0 . 0015 0 . 0011 0 . 9980 0 . 3135 δ [ ◦ ] 1 . 85 × 10 − 1 0 . 4299 0 . 2590 0 . 9998 0 . 1456 P g [p.u.] 1 . 39 × 10 − 2 0 . 1177 0 . 0907 0 . 9999 0 . 1930 Q g [p.u.] 1 . 08 × 10 − 2 0 . 1039 0 . 0683 0 . 9975 0 . 2756 case57 V m [p.u.] 2 . 63 × 10 − 6 0 . 0016 0 . 0011 0 . 9996 0 . 2048 δ [ ◦ ] 3 . 03 × 10 − 2 0 . 1739 0 . 1273 0 . 9996 0 . 1835 P g [p.u.] 1 . 38 × 10 − 3 0 . 0371 0 . 0295 0 . 9999 0 . 1948 Q g [p.u.] 2 . 12 × 10 − 3 0 . 0461 0 . 0362 0 . 9982 0 . 5355 case118 V m [p.u.] 5 . 94 × 10 − 7 0 . 0008 0 . 0006 0 . 9984 0 . 3972 δ [ ◦ ] 1 . 80 × 10 − 1 0 . 4239 0 . 3186 0 . 9995 0 . 2163 P g [p.u.] 4 . 67 × 10 − 2 0 . 2161 0 . 1912 0 . 9995 0 . 4364 Q g [p.u.] 8 . 41 × 10 − 3 0 . 0917 0 . 0664 0 . 9976 0 . 2370 case_illinois200 V m [p.u.] 7 . 41 × 10 − 7 0 . 0009 0 . 0007 0 . 9984 0 . 4159 δ [ ◦ ] 9 . 20 × 10 − 2 0 . 3033 0 . 2262 0 . 9990 0 . 3216 P g [p.u.] 3 . 80 × 10 − 3 0 . 0616 0 . 0538 0 . 9999 0 . 2185 Q g [p.u.] 2 . 26 × 10 − 4 0 . 0150 0 . 0092 0 . 9985 0 . 1211 case300 V m [p.u.] 6 . 68 × 10 − 5 0 . 0082 0 . 0040 0 . 9747 0 . 6270 δ [ ◦ ] 4 . 03 2 . 0070 1 . 4780 0 . 9978 0 . 4153 P g [p.u.] 4 . 39 × 10 − 2 0 . 2096 0 . 1579 0 . 9991 0 . 5945 Q g [p.u.] 1 . 59 × 10 − 2 0 . 1262 0 . 0853 0 . 9979 0 . 1818 case1354pegase V m [p.u.] 1 . 80 × 10 − 5 0 . 0042 0 . 0028 0 . 9725 0 . 8564 δ [ ◦ ] 6 . 86 2 . 6190 1 . 9550 0 . 9972 0 . 6158 P g [p.u.] 8 . 63 2 . 9370 2 . 5700 0 . 9994 0 . 5451 Q g [p.u.] 3 . 98 1 . 9960 1 . 0980 0 . 9966 0 . 1077 case6470rte V m [p.u.] 7 . 38 × 10 − 5 0 . 0086 0 . 0065 0 . 9441 0 . 9846 δ [ ◦ ] 5 . 33 × 10 1 7 . 2980 4 . 9680 0 . 9892 1 . 3800 P g [p.u.] 6 . 07 2 . 4640 2 . 0020 0 . 9981 0 . 9568 Q g [p.u.] 1 . 26 × 10 − 1 0 . 3548 0 . 1721 0 . 9909 0 . 1483 18 VOLUME 11, 2023 Ezeakunne et al. : PowerModelsGAT-AI for Multi-System Power Flow TABLE 10. System-Specific Contingency Performance ( N − 2). Per-system metrics for system-specific models described in Sec. VI-B . Each model is trained and evaluated on scenarios with up to two simultaneous branch outages, testing robustness under severe ( N − 2) contingencies. System Target MSE RMSE MAE R 2 NMAE% case4gs V m [p.u.] 2 . 60 × 10 − 7 0 . 0005 0 . 0004 0 . 9997 0 . 1353 δ [ ◦ ] 2 . 27 × 10 − 3 0 . 0476 0 . 0328 0 . 9999 0 . 0763 P g [p.u.] 3 . 52 × 10 − 4 0 . 0188 0 . 0152 0 . 9998 0 . 2552 Q g [p.u.] 3 . 34 × 10 − 4 0 . 0183 0 . 0144 0 . 9997 0 . 2031 case5 V m [p.u.] 2 . 12 × 10 − 7 0 . 0005 0 . 0003 0 . 9996 0 . 2614 δ [ ◦ ] 3 . 16 × 10 − 2 0 . 1778 0 . 0984 0 . 9987 0 . 2410 P g [p.u.] 2 . 95 × 10 − 3 0 . 0543 0 . 0447 0 . 9993 0 . 4343 Q g [p.u.] 1 . 23 × 10 − 2 0 . 1108 0 . 0814 0 . 9993 0 . 2711 case6ww V m [p.u.] 4 . 50 × 10 − 6 0 . 0021 0 . 0008 0 . 9977 0 . 1698 δ [ ◦ ] 1 . 00 × 10 − 1 0 . 3162 0 . 0810 0 . 9965 0 . 1325 P g [p.u.] 2 . 26 × 10 − 4 0 . 0151 0 . 0094 0 . 9993 0 . 2903 Q g [p.u.] 3 . 51 × 10 − 4 0 . 0187 0 . 0110 0 . 9990 0 . 2135 case9 V m [p.u.] 2 . 58 × 10 − 6 0 . 0016 0 . 0011 0 . 9982 0 . 2795 δ [ ◦ ] 1 . 78 × 10 − 2 0 . 1336 0 . 0772 0 . 9998 0 . 0592 P g [p.u.] 1 . 15 × 10 − 4 0 . 0107 0 . 0088 0 . 9998 0 . 2245 Q g [p.u.] 3 . 32 × 10 − 4 0 . 0182 0 . 0139 0 . 9955 0 . 6693 case14 V m [p.u.] 2 . 79 × 10 − 6 0 . 0017 0 . 0007 0 . 9955 0 . 1475 δ [ ◦ ] 1 . 17 × 10 − 1 0 . 3424 0 . 1739 0 . 9977 0 . 2519 P g [p.u.] 3 . 38 × 10 − 4 0 . 0184 0 . 0131 0 . 9991 0 . 4077 Q g [p.u.] 2 . 85 × 10 − 4 0 . 0169 0 . 0114 0 . 9989 0 . 2854 case30 V m [p.u.] 7 . 61 × 10 − 7 0 . 0009 0 . 0006 0 . 9982 0 . 2340 δ [ ◦ ] 1 . 80 × 10 − 2 0 . 1342 0 . 0786 0 . 9988 0 . 2108 P g [p.u.] 6 . 47 × 10 − 5 0 . 0080 0 . 0056 0 . 9997 0 . 2447 Q g [p.u.] 3 . 21 × 10 − 4 0 . 0179 0 . 0117 0 . 9992 0 . 2546 case_ieee30 V m [p.u.] 2 . 87 × 10 − 6 0 . 0017 0 . 0011 0 . 9960 0 . 3040 δ [ ◦ ] 2 . 74 × 10 − 1 0 . 5231 0 . 2767 0 . 9933 0 . 3963 P g [p.u.] 3 . 70 × 10 − 4 0 . 0192 0 . 0123 0 . 9992 0 . 3618 Q g [p.u.] 6 . 80 × 10 − 4 0 . 0261 0 . 0167 0 . 9978 0 . 4454 case39 V m [p.u.] 7 . 33 × 10 − 6 0 . 0027 0 . 0016 0 . 9943 0 . 4375 δ [ ◦ ] 3 . 58 1 . 8910 0 . 5171 0 . 9958 0 . 2216 P g [p.u.] 7 . 93 × 10 − 3 0 . 0891 0 . 0649 0 . 9999 0 . 1374 Q g [p.u.] 2 . 50 × 10 − 2 0 . 1582 0 . 0831 0 . 9942 0 . 3605 case57 V m [p.u.] 1 . 24 × 10 − 5 0 . 0035 0 . 0019 0 . 9982 0 . 3096 δ [ ◦ ] 1 . 08 × 10 − 1 0 . 3281 0 . 2052 0 . 9988 0 . 1984 P g [p.u.] 6 . 12 × 10 − 3 0 . 0782 0 . 0586 0 . 9993 0 . 4110 Q g [p.u.] 4 . 13 × 10 − 3 0 . 0643 0 . 0509 0 . 9965 0 . 7165 case118 V m [p.u.] 9 . 19 × 10 − 7 0 . 0010 0 . 0006 0 . 9976 0 . 3425 δ [ ◦ ] 4 . 87 × 10 − 1 0 . 6978 0 . 4666 0 . 9989 0 . 2883 P g [p.u.] 4 . 80 × 10 − 2 0 . 2191 0 . 2020 0 . 9995 0 . 4449 Q g [p.u.] 8 . 38 × 10 − 3 0 . 0916 0 . 0678 0 . 9976 0 . 2423 case_illinois200 V m [p.u.] 2 . 81 × 10 − 6 0 . 0017 0 . 0011 0 . 9945 0 . 4120 δ [ ◦ ] 2 . 15 × 10 − 1 0 . 4640 0 . 2957 0 . 9979 0 . 3041 P g [p.u.] 2 . 04 × 10 − 3 0 . 0452 0 . 0330 0 . 9999 0 . 1307 Q g [p.u.] 3 . 28 × 10 − 4 0 . 0181 0 . 0102 0 . 9979 0 . 1157 case300 V m [p.u.] 5 . 13 × 10 − 5 0 . 0072 0 . 0038 0 . 9811 0 . 5010 δ [ ◦ ] 8 . 37 2 . 8930 1 . 9520 0 . 9956 0 . 5713 P g [p.u.] 6 . 04 × 10 − 2 0 . 2458 0 . 1758 0 . 9988 0 . 6712 Q g [p.u.] 2 . 65 × 10 − 2 0 . 1627 0 . 1013 0 . 9965 0 . 2337 case1354pegase V m [p.u.] 1 . 35 × 10 − 5 0 . 0037 0 . 0025 0 . 9792 0 . 5358 δ [ ◦ ] 4 . 62 2 . 1480 1 . 6000 0 . 9980 0 . 5098 P g [p.u.] 2 . 47 1 . 5720 1 . 3580 0 . 9998 0 . 2931 Q g [p.u.] 5 . 02 2 . 2390 1 . 1400 0 . 9957 0 . 1166 case6470rte V m [p.u.] 9 . 47 × 10 − 5 0 . 0097 0 . 0073 0 . 9268 1 . 1110 δ [ ◦ ] 3 . 62 × 10 1 6 . 0130 4 . 3580 0 . 9918 1 . 2110 P g [p.u.] 2 . 90 1 . 7020 1 . 3450 0 . 9990 0 . 6357 Q g [p.u.] 1 . 45 × 10 − 1 0 . 3807 0 . 2012 0 . 9893 0 . 1641 VOLUME 11, 2023 19 Ezeakunne et al. : PowerModelsGAT-AI for Multi-System Power Flow APPENDIX D UNIFIED MODEL PER-SYSTEM RESULTS This appendix reports the per-system performance of the uniﬁed model, which uses a single set of shared weights to solve the AC power ﬂow across all 13 systems. 20 VOLUME 11, 2023 Ezeakunne et al. : PowerModelsGAT-AI for Multi-System Power Flow TABLE 11. Unified PowerModelsGA T -AI Performance Across All 13 Systems ( N − 2 Contingencies). Per-syst em breakdown for results discussed in Sec. VI-C . System Target MSE RMSE MAE R 2 NMAE% case4gs V m [p.u.] 2 . 15 × 10 − 6 0 . 0015 0 . 0010 0 . 9978 0 . 5636 δ [ ◦ ] 1 . 48 × 10 − 1 0 . 3852 0 . 2606 0 . 9938 0 . 6885 P g [p.u.] 6 . 86 × 10 − 3 0 . 0829 0 . 0600 0 . 9953 1 . 0250 Q g [p.u.] 9 . 69 × 10 − 3 0 . 0984 0 . 0765 0 . 9914 1 . 2720 case5 V m [p.u.] 2 . 28 × 10 − 6 0 . 0015 0 . 0011 0 . 9946 1 . 1010 δ [ ◦ ] 4 . 48 × 10 − 1 0 . 6692 0 . 4484 0 . 9832 1 . 0560 P g [p.u.] 3 . 88 × 10 − 2 0 . 1971 0 . 1236 0 . 9919 1 . 2330 Q g [p.u.] 1 . 45 × 10 − 1 0 . 3807 0 . 2781 0 . 9911 0 . 9824 case6ww V m [p.u.] 5 . 26 × 10 − 6 0 . 0023 0 . 0016 0 . 9964 0 . 5615 δ [ ◦ ] 3 . 09 × 10 − 1 0 . 5554 0 . 3798 0 . 9826 1 . 1990 P g [p.u.] 3 . 71 × 10 − 3 0 . 0609 0 . 0462 0 . 9868 1 . 9880 Q g [p.u.] 9 . 00 × 10 − 3 0 . 0949 0 . 0690 0 . 9687 1 . 9090 case9 V m [p.u.] 6 . 17 × 10 − 6 0 . 0025 0 . 0017 0 . 9956 0 . 5519 δ [ ◦ ] 6 . 07 × 10 − 1 0 . 7789 0 . 5323 0 . 9943 0 . 5708 P g [p.u.] 4 . 11 × 10 − 3 0 . 0641 0 . 0518 0 . 9925 1 . 3800 Q g [p.u.] 4 . 37 × 10 − 3 0 . 0661 0 . 0507 0 . 9364 2 . 8470 case14 V m [p.u.] 1 . 18 × 10 − 5 0 . 0034 0 . 0019 0 . 9794 0 . 7643 δ [ ◦ ] 6 . 96 × 10 − 1 0 . 8345 0 . 5249 0 . 9878 0 . 8566 P g [p.u.] 7 . 33 × 10 − 3 0 . 0856 0 . 0634 0 . 9816 2 . 2490 Q g [p.u.] 1 . 06 × 10 − 2 0 . 1029 0 . 0739 0 . 9604 2 . 1170 case30 V m [p.u.] 4 . 06 × 10 − 6 0 . 0020 0 . 0014 0 . 9901 0 . 5899 δ [ ◦ ] 4 . 12 × 10 − 1 0 . 6420 0 . 4702 0 . 9665 1 . 6820 P g [p.u.] 3 . 13 × 10 − 3 0 . 0559 0 . 0435 0 . 9848 2 . 0790 Q g [p.u.] 4 . 69 × 10 − 3 0 . 0685 0 . 0501 0 . 9894 1 . 1460 case_ieee30 V m [p.u.] 1 . 70 × 10 − 5 0 . 0041 0 . 0023 0 . 9782 0 . 5411 δ [ ◦ ] 1 . 23 1 . 1100 0 . 5259 0 . 9781 0 . 7243 P g [p.u.] 1 . 33 × 10 − 2 0 . 1154 0 . 0531 0 . 9728 1 . 5310 Q g [p.u.] 1 . 12 × 10 − 2 0 . 1056 0 . 0701 0 . 9657 1 . 8230 case39 V m [p.u.] 4 . 84 × 10 − 5 0 . 0070 0 . 0040 0 . 9666 1 . 1300 δ [ ◦ ] 7 . 10 2 . 6640 1 . 6600 0 . 9924 0 . 7471 P g [p.u.] 4 . 84 × 10 − 2 0 . 2200 0 . 1571 0 . 9996 0 . 3408 Q g [p.u.] 1 . 31 × 10 − 1 0 . 3625 0 . 2275 0 . 9729 0 . 9847 case57 V m [p.u.] 2 . 84 × 10 − 5 0 . 0053 0 . 0032 0 . 9957 0 . 5885 δ [ ◦ ] 7 . 16 × 10 − 1 0 . 8462 0 . 5790 0 . 9913 0 . 8674 P g [p.u.] 2 . 94 × 10 − 2 0 . 1714 0 . 1481 0 . 9963 1 . 1820 Q g [p.u.] 2 . 28 × 10 − 2 0 . 1509 0 . 1159 0 . 9815 1 . 8360 case118 V m [p.u.] 4 . 98 × 10 − 6 0 . 0022 0 . 0016 0 . 9872 1 . 1450 δ [ ◦ ] 2 . 49 1 . 5780 1 . 1340 0 . 9943 0 . 8487 P g [p.u.] 4 . 80 × 10 − 2 0 . 2191 0 . 1647 0 . 9995 0 . 3796 Q g [p.u.] 7 . 31 × 10 − 2 0 . 2703 0 . 1914 0 . 9791 0 . 7890 case_illinois200 V m [p.u.] 9 . 89 × 10 − 6 0 . 0031 0 . 0022 0 . 9791 1 . 1470 δ [ ◦ ] 8 . 37 × 10 − 1 0 . 9148 0 . 6255 0 . 9906 0 . 9506 P g [p.u.] 1 . 97 × 10 − 2 0 . 1403 0 . 0979 0 . 9992 0 . 4183 Q g [p.u.] 7 . 70 × 10 − 3 0 . 0877 0 . 0614 0 . 9444 0 . 8882 case300 V m [p.u.] 1 . 06 × 10 − 4 0 . 0103 0 . 0056 0 . 9580 0 . 9966 δ [ ◦ ] 1 . 53 × 10 1 3 . 9100 2 . 7850 0 . 9905 0 . 9322 P g [p.u.] 1 . 46 × 10 − 1 0 . 3826 0 . 2788 0 . 9968 1 . 0730 Q g [p.u.] 1 . 72 × 10 − 1 0 . 4145 0 . 2747 0 . 9765 0 . 6868 case1354pegase V m [p.u.] 8 . 19 × 10 − 5 0 . 0091 0 . 0067 0 . 8724 2 . 2770 δ [ ◦ ] 1 . 29 × 10 1 3 . 5850 2 . 7370 0 . 9944 0 . 8632 P g [p.u.] 1 . 82 1 . 3480 1 . 0880 0 . 9999 0 . 2359 Q g [p.u.] 6 . 96 2 . 6390 1 . 4070 0 . 9943 0 . 1620 All Systems (Pooled) V m [p.u.] 7 . 04 × 10 − 5 0 . 0084 0 . 0056 0 . 9586 0 . 8852 δ [ ◦ ] 1 . 06 × 10 1 3 . 2540 2 . 2920 0 . 9943 0 . 7108 P g [p.u.] 1 . 68 × 10 − 1 0 . 4102 0 . 1828 0 . 9998 0 . 0396 Q g [p.u.] 3 . 92 1 . 9790 0 . 8633 0 . 9942 0 . 0994 VOLUME 11, 2023 21 Ezeakunne et al. : PowerModelsGAT-AI for Multi-System Power Flow TABLE 12. Unified PowerModelsGA T -AI Performance Across Systems under N − 0, N − 1, and N − 2 Contingencies. Comparison of Mean Squared Error (MSE), Mean Absolute Error (MAE), and Coefficient of Determination ( R 2 ) for voltage magnitude, voltage angle, and power generation. N-0 (Normal) N-1 Contingency N-2 Contingency System Target MSE MAE R 2 MSE MAE R 2 MSE MAE R 2 case4gs V m [p.u.] 8 . 44 × 10 − 7 0 . 0007 0 . 9972 3 . 30 × 10 − 6 0 . 0013 0 . 9975 – – – δ [ ◦ ] 5 . 53 × 10 − 2 0 . 1771 0 . 9873 2 . 33 × 10 − 1 0 . 3360 0 . 9944 – – – P g [p.u.] 6 . 17 × 10 − 3 0 . 0570 0 . 9955 7 . 50 × 10 − 3 0 . 0626 0 . 9951 – – – Q g [p.u.] 7 . 49 × 10 − 3 0 . 0697 0 . 9927 1 . 17 × 10 − 2 0 . 0826 0 . 9902 – – – case5 V m [p.u.] 1 . 12 × 10 − 6 0 . 0008 0 . 9964 2 . 46 × 10 − 6 0 . 0012 0 . 9940 3 . 64 × 10 − 6 0 . 0015 0 . 9930 δ [ ◦ ] 1 . 17 × 10 − 1 0 . 2547 0 . 9823 3 . 96 × 10 − 1 0 . 4570 0 . 9798 9 . 51 × 10 − 1 0 . 6975 0 . 9838 P g [p.u.] 4 . 02 × 10 − 2 0 . 1095 0 . 9929 3 . 26 × 10 − 2 0 . 1112 0 . 9917 4 . 45 × 10 − 2 0 . 1570 0 . 9891 Q g [p.u.] 1 . 46 × 10 − 1 0 . 2553 0 . 9933 1 . 54 × 10 − 1 0 . 3050 0 . 9898 1 . 32 × 10 − 1 0 . 2783 0 . 9873 case6ww V m [p.u.] 1 . 77 × 10 − 6 0 . 0010 0 . 9975 4 . 54 × 10 − 6 0 . 0016 0 . 9968 1 . 09 × 10 − 5 0 . 0025 0 . 9952 δ [ ◦ ] 9 . 57 × 10 − 2 0 . 2436 0 . 9900 2 . 00 × 10 − 1 0 . 3328 0 . 9883 7 . 14 × 10 − 1 0 . 6143 0 . 9733 P g [p.u.] 1 . 93 × 10 − 3 0 . 0341 0 . 9943 3 . 57 × 10 − 3 0 . 0457 0 . 9886 6 . 31 × 10 − 3 0 . 0632 0 . 9620 Q g [p.u.] 3 . 98 × 10 − 3 0 . 0468 0 . 9828 7 . 94 × 10 − 3 0 . 0676 0 . 9746 1 . 71 × 10 − 2 0 . 1012 0 . 9492 case9 V m [p.u.] 2 . 59 × 10 − 6 0 . 0013 0 . 9967 1 . 10 × 10 − 5 0 . 0024 0 . 9945 – – – δ [ ◦ ] 2 . 64 × 10 − 1 0 . 3435 0 . 9943 1 . 07 0 . 7866 0 . 9943 – – – P g [p.u.] 3 . 36 × 10 − 3 0 . 0471 0 . 9938 5 . 02 × 10 − 3 0 . 0571 0 . 9910 – – – Q g [p.u.] 2 . 82 × 10 − 3 0 . 0420 0 . 9288 6 . 45 × 10 − 3 0 . 0623 0 . 9245 – – – case14 V m [p.u.] 1 . 97 × 10 − 6 0 . 0011 0 . 9944 5 . 30 × 10 − 6 0 . 0017 0 . 9901 3 . 04 × 10 − 5 0 . 0030 0 . 9637 δ [ ◦ ] 1 . 50 × 10 − 1 0 . 3026 0 . 9933 4 . 98 × 10 − 1 0 . 4913 0 . 9908 1 . 54 0 . 8111 0 . 9825 P g [p.u.] 2 . 69 × 10 − 3 0 . 0405 0 . 9935 5 . 80 × 10 − 3 0 . 0583 0 . 9866 1 . 43 × 10 − 2 0 . 0949 0 . 9561 Q g [p.u.] 6 . 67 × 10 − 3 0 . 0618 0 . 9768 7 . 81 × 10 − 3 0 . 0671 0 . 9669 1 . 82 × 10 − 2 0 . 0953 0 . 9354 case30 V m [p.u.] 2 . 01 × 10 − 6 0 . 0011 0 . 9941 3 . 90 × 10 − 6 0 . 0014 0 . 9899 6 . 27 × 10 − 6 0 . 0018 0 . 9872 δ [ ◦ ] 2 . 15 × 10 − 1 0 . 3685 0 . 9816 4 . 00 × 10 − 1 0 . 4755 0 . 9668 6 . 23 × 10 − 1 0 . 5650 0 . 9528 P g [p.u.] 2 . 00 × 10 − 3 0 . 0361 0 . 9913 3 . 60 × 10 − 3 0 . 0459 0 . 9827 3 . 94 × 10 − 3 0 . 0505 0 . 9774 Q g [p.u.] 4 . 49 × 10 − 3 0 . 0482 0 . 9918 4 . 33 × 10 − 3 0 . 0483 0 . 9897 5 . 29 × 10 − 3 0 . 0542 0 . 9857 case_ieee30 V m [p.u.] 3 . 45 × 10 − 6 0 . 0015 0 . 9927 2 . 04 × 10 − 5 0 . 0026 0 . 9716 3 . 50 × 10 − 5 0 . 0035 0 . 9732 δ [ ◦ ] 1 . 56 × 10 − 1 0 . 3125 0 . 9925 2 . 65 0 . 6511 0 . 9707 1 . 41 0 . 7250 0 . 9795 P g [p.u.] 1 . 61 × 10 − 3 0 . 0306 0 . 9964 3 . 72 × 10 − 2 0 . 0790 0 . 9400 6 . 46 × 10 − 3 0 . 0631 0 . 9829 Q g [p.u.] 5 . 97 × 10 − 3 0 . 0591 0 . 9818 1 . 83 × 10 − 2 0 . 0818 0 . 9473 1 . 17 × 10 − 2 0 . 0754 0 . 9604 case39 V m [p.u.] 2 . 41 × 10 − 5 0 . 0030 0 . 9809 3 . 57 × 10 − 5 0 . 0036 0 . 9700 9 . 78 × 10 − 5 0 . 0057 0 . 9502 δ [ ◦ ] 4 . 25 1 . 3440 0 . 9949 6 . 53 1 . 5790 0 . 9924 1 . 18 × 10 1 2 . 2080 0 . 9892 P g [p.u.] 3 . 42 × 10 − 2 0 . 1301 0 . 9997 3 . 83 × 10 − 2 0 . 1466 0 . 9997 7 . 88 × 10 − 2 0 . 2030 0 . 9994 Q g [p.u.] 7 . 50 × 10 − 2 0 . 1850 0 . 9858 1 . 35 × 10 − 1 0 . 2255 0 . 9661 2 . 08 × 10 − 1 0 . 2901 0 . 9602 case57 V m [p.u.] 1 . 06 × 10 − 5 0 . 0024 0 . 9982 4 . 35 × 10 − 5 0 . 0037 0 . 9939 4 . 07 × 10 − 5 0 . 0042 0 . 9944 δ [ ◦ ] 3 . 43 × 10 − 1 0 . 4542 0 . 9953 7 . 81 × 10 − 1 0 . 6362 0 . 9907 1 . 34 0 . 7445 0 . 9863 P g [p.u.] 2 . 74 × 10 − 2 0 . 1514 0 . 9964 2 . 92 × 10 − 2 0 . 1491 0 . 9964 3 . 98 × 10 − 2 0 . 1614 0 . 9951 Q g [p.u.] 1 . 84 × 10 − 2 0 . 1038 0 . 9852 2 . 42 × 10 − 2 0 . 1200 0 . 9797 2 . 90 × 10 − 2 0 . 1329 0 . 9772 case118 V m [p.u.] 3 . 83 × 10 − 6 0 . 0015 0 . 9899 5 . 19 × 10 − 6 0 . 0017 0 . 9870 6 . 00 × 10 − 6 0 . 0017 0 . 9846 δ [ ◦ ] 1 . 43 0 . 9167 0 . 9964 2 . 89 1 . 2130 0 . 9935 3 . 19 1 . 2770 0 . 9928 P g [p.u.] 2 . 44 × 10 − 2 0 . 1213 0 . 9997 6 . 08 × 10 − 2 0 . 1857 0 . 9994 6 . 20 × 10 − 2 0 . 1950 0 . 9994 Q g [p.u.] 6 . 69 × 10 − 2 0 . 1858 0 . 9806 7 . 33 × 10 − 2 0 . 1921 0 . 9802 7 . 92 × 10 − 2 0 . 1963 0 . 9761 case_illinois200 V m [p.u.] 6 . 54 × 10 − 6 0 . 0020 0 . 9878 1 . 33 × 10 − 5 0 . 0024 0 . 9673 1 . 19 × 10 − 5 0 . 0025 0 . 9726 δ [ ◦ ] 4 . 44 × 10 − 1 0 . 5202 0 . 9956 8 . 07 × 10 − 1 0 . 6556 0 . 9897 1 . 93 0 . 8651 0 . 9763 P g [p.u.] 2 . 03 × 10 − 2 0 . 0933 0 . 9993 1 . 69 × 10 − 2 0 . 1024 0 . 9992 2 . 49 × 10 − 2 0 . 1093 0 . 9988 Q g [p.u.] 8 . 03 × 10 − 3 0 . 0620 0 . 9498 7 . 59 × 10 − 3 0 . 0612 0 . 9394 6 . 90 × 10 − 3 0 . 0599 0 . 9357 case300 V m [p.u.] 1 . 03 × 10 − 4 0 . 0054 0 . 9563 1 . 11 × 10 − 4 0 . 0058 0 . 9607 1 . 06 × 10 − 4 0 . 0057 0 . 9559 δ [ ◦ ] 1 . 08 × 10 1 2 . 4870 0 . 9924 2 . 17 × 10 1 3 . 3040 0 . 9883 1 . 41 × 10 1 2 . 5870 0 . 9905 P g [p.u.] 1 . 14 × 10 − 1 0 . 2527 0 . 9972 2 . 13 × 10 − 1 0 . 3293 0 . 9957 1 . 06 × 10 − 1 0 . 2521 0 . 9977 Q g [p.u.] 1 . 60 × 10 − 1 0 . 2686 0 . 9783 1 . 89 × 10 − 1 0 . 2806 0 . 9746 1 . 68 × 10 − 1 0 . 2768 0 . 9760 case1354pegase V m [p.u.] 8 . 22 × 10 − 5 0 . 0067 0 . 8704 7 . 87 × 10 − 5 0 . 0066 0 . 8789 8 . 49 × 10 − 5 0 . 0067 0 . 8687 δ [ ◦ ] 1 . 27 × 10 1 2 . 7040 0 . 9947 1 . 34 × 10 1 2 . 8470 0 . 9944 1 . 24 × 10 1 2 . 6750 0 . 9935 P g [p.u.] 1 . 71 1 . 0580 0 . 9999 2 . 52 1 . 3020 0 . 9998 1 . 24 0 . 9063 0 . 9999 Q g [p.u.] 6 . 79 1 . 4060 0 . 9943 7 . 14 1 . 4140 0 . 9941 6 . 99 1 . 3970 0 . 9943 22 VOLUME 11, 2023 Ezeakunne et al. : PowerModelsGAT-AI for Multi-System Power Flow APPENDIX E DETAILED CONTINUAL LEARNING RESULTS This appendix reports the per-system knowledge-loss values after ﬁne-tuning (T able 13 and T able 14). Bar-chart visualiza- tions of the same results are pro vided in the Supplementary Material (Figs. S15 and S16). VOLUME 11, 2023 23 Ezeakunne et al. : PowerModelsGAT-AI for Multi-System Power Flow TABLE 13. Per-System Knowledge Loss ( K NMAE τ ) After Fine- T uning on case1354pegase . Per-system values are reported for Naive and EWC+Replay across all base systems. System K NMAE V m K NMAE δ K NMAE P g K NMAE Q g Naive EWC+Replay Naive EWC+Replay Naive EWC+Replay Naive EWC+Replay case4gs 19 . 53 0 . 05 258 . 30 − 0 . 05 1433 . 74 1 . 14 40 . 41 1 . 25 case5 9 . 91 0 . 69 219 . 00 0 . 04 721 . 58 1 . 21 12 . 90 0 . 80 case6ww 18 . 37 0 . 24 245 . 35 0 . 20 2887 . 41 1 . 62 121 . 43 1 . 88 case9 10 . 38 0 . 21 41 . 72 0 . 10 1493 . 63 1 . 58 110 . 11 3 . 20 case14 7 . 50 0 . 29 59 . 09 0 . 14 2912 . 61 2 . 26 303 . 94 2 . 32 case30 12 . 57 0 . 22 406 . 13 − 0 . 03 7737 . 94 1 . 61 47 . 59 1 . 64 case_ieee30 6 . 43 0 . 20 99 . 53 0 . 15 3280 . 43 2 . 23 287 . 24 2 . 27 case39 8 . 73 0 . 31 33 . 42 0 . 12 322 . 10 0 . 15 12 . 68 0 . 67 case57 22 . 14 0 . 24 111 . 06 0 . 24 824 . 90 1 . 79 239 . 30 2 . 22 case118 9 . 29 0 . 29 42 . 34 0 . 12 208 . 89 0 . 37 23 . 71 0 . 79 case_illinois200 13 . 62 0 . 25 116 . 10 0 . 08 292 . 57 0 . 29 31 . 73 0 . 91 case300 5 . 58 0 . 14 26 . 72 0 . 32 586 . 10 0 . 76 14 . 46 0 . 57 TABLE 14. Per-Syst em Knowledge Loss ( K MAE τ ) in Physical Units After Fine-T uning on case1354pegase . Per-system values are repor ted for Naive and EWC+Replay across all base systems. System K MAE V m [p.u.] K MAE δ [ ◦ ] K MAE P g [p.u.] K MAE Q g [p.u.] Naive EWC+Replay Naiv e EWC+Replay Naive EWC+Replay Naive EWC+Replay case4gs 0 . 04 0 . 00 97 . 77 − 0 . 02 83 . 87 0 . 07 2 . 43 0 . 08 case5 0 . 01 0 . 00 92 . 95 0 . 02 72 . 30 0 . 12 3 . 65 0 . 23 case6ww 0 . 05 0 . 00 77 . 70 0 . 06 67 . 03 0 . 04 4 . 39 0 . 07 case9 0 . 03 0 . 00 38 . 91 0 . 10 56 . 05 0 . 06 1 . 96 0 . 06 case14 0 . 02 0 . 00 36 . 21 0 . 08 82 . 02 0 . 06 10 . 61 0 . 08 case30 0 . 03 0 . 00 113 . 55 − 0 . 01 161 . 79 0 . 03 2 . 08 0 . 07 case_ieee30 0 . 03 0 . 00 72 . 26 0 . 11 113 . 83 0 . 08 11 . 05 0 . 09 case39 0 . 03 0 . 00 74 . 27 0 . 27 148 . 48 0 . 07 2 . 93 0 . 15 case57 0 . 12 0 . 00 74 . 13 0 . 16 103 . 31 0 . 22 15 . 10 0 . 14 case118 0 . 01 0 . 00 56 . 57 0 . 17 90 . 67 0 . 16 5 . 75 0 . 19 case_illinois200 0 . 03 0 . 00 76 . 40 0 . 05 68 . 45 0 . 07 2 . 19 0 . 06 case300 0 . 03 0 . 00 79 . 82 0 . 95 152 . 26 0 . 20 5 . 78 0 . 23 24 VOLUME 11, 2023 Ezeakunne et al. : PowerModelsGAT-AI for Multi-System Power Flow CONFLICT OF INTEREST None of the authors have a conﬂict of interest to disclose. REFERENCES [1] J. Carpentier, ‘ ‘Optimal power ﬂows, ’ ’ International Journal of Electrical P ower & Energy Systems , vol. 1, no. 1, pp. 3–15, 1979. [Online]. A vailable: https://www .sciencedirect.com/science/article/pii/0142061579900267 [2] S. Frank and S. Rebennack, ‘ ‘ An introduction to optimal po wer ﬂo w: Theory ,formulation, and examples, ’ ’ IIE transactions , vol. 48, no. 12, pp. 1172–1197, 2016. [3] S. Bose, D. F . Gayme, S. Low , and K. M. Chandy , ‘ ‘Optimal power ﬂow ov er tree networks, ’ ’ in 2011 49th annual Allerton confer ence on communication, contr ol, and computing (Allerton) . IEEE, 2011, pp. 1342–1348. [4] A. J. W ood, B. F . W ollenberg, and G. B. Sheblé, P ower generation, operation, and control . John wiley & sons, 2013. [5] B. Jiang, Q. W ang, S. Wu, Y . W ang, and G. Lu, ‘ ‘ Advancements and future directions in the application of machine learning to ac optimal power ﬂow: a critical review , ’ ’ Ener gies , vol. 17, no. 6, p. 1381, 2024. [6] B. Stott and O. Alsac, ‘ ‘Fast decoupled load ﬂow , ’ ’ IEEE Tr ansactions on P ower Apparatus and Systems , vol. P AS-93, no. 3, pp. 859–869, 1974. [7] E. Mohagheghi, M. Alramlawi, A. Gabash, and P . Li, ‘ ‘ A survey of real-time optimal power ﬂow , ’ ’ Ener gies , vol. 11, no. 11, p. 3142, 2018. [8] Z. W ang, A. Y ounesi, M. V . Liu, G. C. Guo, and C. L. Anderson, ‘ ‘ Ac optimal power ﬂow in power systems with renewable energy integration: A revie w of formulations and case studies, ’ ’ IEEE Access , vol. 11, pp. 102 681–102 712, 2023. [9] B. Feng, J. Zhao, G. Huang, Y . Hu, H. Xu, C. Guo, and Z. Chen, ‘ ‘Safe deep reinforcement learning for real-time ac optimal power ﬂow: A near-optimal solution, ’ ’ CSEE Journal of P ower and Energy Systems , 2024. [10] S. Iwamoto and Y . T amura, ‘ ‘ A load ﬂow calculation method for ill-conditioned power systems, ’ ’ IEEE Tr ansactions on P ower Apparatus and Systems , vol. P AS-100, pp. 1736–1743, 1981. [Online]. A vailable: https://api.semanticscholar .org/CorpusID:29548827 [11] K. Dehghanpour, Z. W ang, J. W ang, Y . Y uan, and F . Bu, ‘ ‘ A survey on state estimation techniques and challenges in smart distribution systems, ’ ’ IEEE T ransactions on Smart Grid , vol. 10, no. 2, pp. 2312–2322, 2018, [12] B. Donon, R. Clément, B. Donnot, A. Marot, I. Guyon, and M. Schoenauer , ‘ ‘Neural networks for power ﬂow: Graph neural solver, ’ ’ Electric P ower Systems Research , vol. 189, p. 106547, 2020. [13] T . B. Lopez-Garcia and J. A. Domínguez-Navarro, ‘ ‘Power ﬂow analysis via typed graph neural networks, ’ ’ Engineering Applications of Artiﬁcial Intelligence , vol. 117, p. 105567, 2023. [14] N. Lin, S. Orfanoudakis, N. O. Cardenas, J. S. Giraldo, and P . P . V ergara, ‘ ‘Po werﬂownet: Power ﬂow approximation using message passing graph neural networks, ’ ’ International Journal of Electrical P ower & Energy Systems , vol. 160, p. 110112, 2024, [15] C. Ugwumadu, J. T abarez, D. A. Drabold, and A. Pandey , ‘ ‘Powermodel-ai: A ﬁrst on-the-ﬂy machine-learning predictor for ac power ﬂow solutions, ’ ’ Ener gies , vol. 18, no. 8, p. 1968, 2025. [16] X. Hu, J. Y ang, Y . Gao, M. Zhu, Q. Zhang, H. Chen, and J. Zhao, ‘ ‘ Adaptive power ﬂo w analysis for power system operation based on graph deep learning, ’ ’ International Journal of Electrical P ower & Energy Systems , vol. 161, p. 110166, 2024. [17] M. T aghizadeh, K. Khayambashi, M. A. Hasnat, and N. Alemazkoor , ‘ ‘Multi- ﬁdelity graph neural networks for efﬁcient power ﬂow analysis under high- dimensional demand and renewable generation uncertainty , ’ ’ Electric P ower Systems Research , vol. 237, p. 111014, 2024. [18] M. Y ang, G. Qiu, T . Liu, J. Liu, K. Liu, and Y . Li, ‘ ‘Probabilistic power ﬂow based on physics-guided graph neural networks, ’ ’ Electric P ower Systems Resear ch , vol. 235, p. 110864, 2024. [19] L. Böttcher , H. W olf, B. Jung, P . Lutat, M. Trageser , O. Pohl, X. T ao, A. Ulbig, and M. Grohe, ‘ ‘Solving ac power ﬂow with graph neural networks under realistic constraints, ’ ’ in 2023 IEEE Belgrade P owerT ech . IEEE, 2023, pp. 1–7, [20] A. Deihim, D. Apostolopoulou, and E. Alonso, ‘ ‘Initial estimate of ac optimal power ﬂow with graph neural networks, ’ ’ Electric P ower Systems Resear ch , vol. 234, p. 110782, 2024. [21] D. Owerko, F . Gama, and A. Ribeiro, ‘ ‘Unsupervised optimal power ﬂow using graph neural networks, ’ ’ in ICASSP 2024-2024 IEEE International Confer ence on Acoustics, Speech and Signal Processing (ICASSP) . IEEE, 2024, pp. 6885–6889, [22] H. Lin and Y . Sun, ‘ ‘Elegnn: Electrical-model-guided graph neural networks for power distribution system state estimation, ’ ’ in GLOBECOM 2022-2022 IEEE Global Communications Conference . IEEE, 2022, pp. 5292–5298. [23] S. Pineda, J. Pérez-Ruiz, and J. M. Morales, ‘ ‘Beyond the neural fog: Interpretable learning for ac optimal power ﬂow , ’ ’ IEEE T ransactions on P ower Systems , 2025, [24] H. Khaloie, M. Dolanyi, J.-F . T oubeau, and F . V allée, ‘ ‘Revie w of machine learning techniques for optimal power ﬂow , ’ ’ Applied Energy , vol. 388, p. 125637, 2025. [25] Z. W u, S. Pan, F . Chen, G. Long, C. Zhang, and P . S. Y u, ‘ ‘ A comprehensiv e survey on graph neural networks, ’ ’ IEEE transactions on neural networks and learning systems , vol. 32, no. 1, pp. 4–24, 2020, [26] W . Liao, B. Bak-Jensen, J. R. Pillai, Y . W ang, and Y . W ang, ‘ ‘ A review of graph neural networks and their applications in power systems, ’ ’ Journal of Modern P ower Systems and Clean Energy , vol. 10, no. 2, pp. 345–360, 2021, [27] T . Pham and X. Li, ‘ ‘Neural network-based power ﬂow model, ’ ’ in 2022 IEEE Green T echnolo gies Conference (GreenT ech) . IEEE, 2022, pp. 105– 109, [28] S. T alebi and K. Zhou, ‘ ‘Graph neural networks for efﬁcient ac power ﬂow prediction in power grids, ’ ’ arXiv preprint , 2025. [29] O. Arowolo and J. L. Cremer , ‘ ‘T owards generalization of graph neural networks for ac optimal power ﬂow , ’ ’ arXiv preprint , 2025. [30] A. M. Nakiganda and S. Chatziv asileiadis, ‘ ‘Graph neural networks for fast contingency analysis of power systems, ’ ’ 2025. [Online]. A vailable: https://arxiv .org/abs/2310.04213 [31] G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P . Perdikaris, S. W ang, and L. Y ang, ‘ ‘Physics-informed machine learning, ’ ’ Nature Reviews Physics , vol. 3, no. 6, pp. 422–440, 2021. [32] M. Raissi, P . Perdikaris, and G. E. Karniadakis, ‘ ‘Physics-informed neural networks: A deep learning framework for solving forward and in verse problems in volving nonlinear partial differential equations, ’ ’ Journal of Computational physics , vol. 378, pp. 686–707, 2019. [33] V . Eeckhout, H. Fani, M. U. Hashmi, and G. Deconinck, ‘ ‘Improved physics- informed neural network based ac power ﬂow for distribution networks, ’ ’ in 2024 IEEE PES Innovative Smart Grid T echnologies Europe (ISGT EUR OPE) . IEEE, 2024, pp. 1–6, [34] H. Li, L. Liu, and Q. Wu, ‘ ‘Physics-guided chebyshev graph con volution network for optimal power ﬂow , ’ ’ Electric P ower Systems Research , vol. 245, p. 111651, 2025. [35] M. Shamseldein, ‘ ‘ A hybrid gnn-lse method for fast, robust, and physically- consistent ac power ﬂow , ’ ’ Electric P ower Systems Research , vol. 252, p. 112380, 2025, [36] J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. V eness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T . Ramalho, A. Grabska-Barwinska et al. , ‘ ‘Overcoming catastrophic forgetting in neural networks, ’ ’ Pr oceedings of the national academy of sciences , vol. 114, no. 13, pp. 3521–3526, 2017, [37] A. Carta, A. Cossu, F . Errica, and D. Bacciu, ‘ ‘Catastrophic for getting in deep graph networks: A graph classiﬁcation benchmark, ’ ’ F rontier s in artiﬁcial intelligence , vol. 5, p. 824655, 2022. [38] A. V arbella, K. Amara, B. Gjorgiev , M. El-Assady , and G. Sansavini, ‘ ‘Po wergraph: A power grid benchmark dataset for graph neural networks, ’ ’ Advances in Neural Information Proces sing Systems , vol. 37, pp. 110 784– 110 804, 2024, [39] Z. Ying, D. Bourgeois, J. Y ou, M. Zitnik, and J. Leskov ec, ‘ ‘Gnnexplainer: Generating explanations for graph neural networks, ’ ’ Advances in neural information processing systems , vol. 32, 2019, [40] M. Sundararajan, A. T aly , and Q. Y an, ‘ ‘ Axiomatic attribution for deep networks, ’ ’ in Pr oceedings of the 34th International Conference on Machine Learning - V olume 70 , ser. ICML ’17. JMLR.org, 2017, p. 3319â = C “3328, [41] L. Thurner , A. Scheidler, F . Schäfer , J. Menke, J. Dollichon, F . Meier , S. Meinecke, and M. Braun, ‘ ‘pandapo wer — an open-source python tool for con venient modeling, analysis, and optimization of electric power systems, ’ ’ IEEE Tr ansactions on P ower Systems , vol. 33, no. 6, pp. 6510–6521, Nov 2018, [42] R. D. Zimmerman, C. E. Murillo-Sánchez, and R. J. Thomas, ‘ ‘Matpower: Steady-state operations, planning, and analysis tools for power systems research and education, ’ ’ IEEE T ransactions on P ower Systems , vol. 26, no. 1, pp. 12–19, 2011. [43] C. Josz, S. Fliscounakis, J. Maeght, and P . Panciatici, ‘ ‘ Ac power ﬂow data in matpower and qcqp format: itesla, rte snapshots, and VOLUME 11, 2023 25 Ezeakunne et al. : PowerModelsGAT-AI for Multi-System Power Flow pegase, ’ ’ arXiv preprint , 2016. [Online]. A vailable: https://arxiv .org/abs/1603.01533 [44] S. Fliscounakis, P . Panciatici, F . Capitanescu, and L. W ehenkel, ‘ ‘Con- tingency ranking with respect to overloads in very large power systems taking into account uncertainty , preventi ve, and corrective actions, ’ ’ IEEE T ransactions on P ower Systems , vol. 28, no. 4, pp. 4909–4917, 2013. [45] T . Cai, S. Luo, K. Xu, D. He, T .-y . Liu, and L. W ang, ‘ ‘Graphnorm: A principled approach to accelerating graph neural network training, ’ ’ in International Conference on Machine Learning . PMLR, 2021, pp. 1204– 1215. [46] S. Brody , U. Alon, and E. Y ahav , ‘ ‘How attentiv e are graph attention networks?’ ’ in International Confer ence on Learning Repr esentations (ICLR) , 2022, arXiv:2105.14491. [Online]. A vailable: https://openrevie w .net/forum?id=F72ximsx7C1 [47] A. Kendall, Y . Gal, and R. Cipolla, ‘ ‘Multi-task learning using uncertainty to weigh losses for scene geometry and semantics, ’ ’ in Pr oceedings of the IEEE conference on computer vision and pattern r ecognition , 2018, pp. 7482–7491, [48] I. Loshchilov and F . Hutter, ‘ ‘Stochastic gradient descent with warm restarts, ’ ’ in Proceedings of the 5th Int. Conf. Learning Representations , 2017, pp. 1–16, [49] A. Paszke, S. Gross, F . Massa, A. Lerer, J. Bradbury , G. Chanan, T . Killeen, Z. Lin, N. Gimelshein, L. Antiga et al. , ‘ ‘Pytorch: An imperative style, high-performance deep learning library , ’ ’ Advances in Neural Information Pr ocessing Systems , vol. 32, 2019, [50] M. Fey , J. Sunil, A. Nitta, R. Puri, M. Shah, B. Stojanović, R. Bendias, A. Barghi, V . Kocijan, Z. Zhang et al. , ‘ ‘Pyg 2.0: Scalable learning on real world graphs, ’ ’ arXiv preprint , 2025. [51] C. R. Harris, K. J. Millman, S. J. van der W alt, R. Gommers, P . V irtanen, D. Cournapeau, E. Wieser , J. T aylor, S. Berg, N. J. Smith, R. Kern, M. Picus, S. Hoyer, M. H. van Kerkwijk, M. Brett, A. Haldane, J. Fernández del Río, M. Wiebe, P . Peterson, P . Gérard-Marchant, K. Sheppard, T . Reddy , W . W eckesser , H. Abbasi, C. Gohlke, and T . E. Oliphant, ‘ ‘ Array programming with NumPy, ’ ’ Nature , vol. 585, no. 7825, pp. 357–362, Sep. 2020, arXiv:2006.10256. [Online]. A vailable: https://doi.org/10.1038/s41586-020-2649-2 [52] P . Virtanen, R. Gommers, T . E. Oliphant, M. Haberland, T . Reddy , D. Cour- napeau, E. Burovski, P . Peterson, W . W eckesser , J. Bright, S. J. van der W alt, M. Brett, J. Wilson, K. J. Millman, N. Mayorov , A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. J. Carey , İ. Polat, Y . Feng, E. W . Moore, J. V anderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen, E. A. Quintero, C. R. Harris, A. M. Archibald, A. H. Ribeiro, F . Pedregosa, P . van Mulbregt, and SciPy 1.0 Contributors, ‘ ‘SciPy 1.0: Fundamental algorithms for scientiﬁc computing in python, ’ ’ Nature Methods , vol. 17, pp. 261–272, 2020, [53] A. A. Hagberg, D. A. Schult, and P . J. Swart, ‘ ‘Exploring network structure, dynamics, and function using networkx, ’ ’ in Proceedings of the 7th Python in Science Confer ence , G. V aroquaux, T . V aught, and J. Millman, Eds., Pasadena, CA, USA, 2008, pp. 11–15. [54] J. D. Hunter , ‘ ‘Matplotlib: A 2d graphics environment, ’ ’ Computing in Science & Engineering , vol. 9, no. 3, pp. 90–95, 2007. [55] F . Pedregosa, G. V aroquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P . Prettenhofer, R. W eiss, V . Dubourg, J. V anderplas, A. Passos, D. Cournapeau, M. Brucher , M. Perrot, and E. Duchesnay , ‘ ‘Scikit-learn: Machine learning in Python, ’ ’ Journal of Mac hine Learning Research , vol. 12, pp. 2825–2830, 2011. [56] E. Caro, ‘ ‘Uncertainty analysis of power system state estimates and reference bus selection, ’ ’ Electric P ower Systems Resear ch , vol. 136, pp. 322–330, 2016. [57] S. Jain and B. C. W allace, ‘ ‘Attention is not Explanation, ’ ’ in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language T echnologies, V olume 1 (Long and Short P apers) , J. Burstein, C. Doran, and T . Solorio, Eds. Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019, pp. 3543–3556, arXiv:1902.10186. [Online]. A vailable: https://aclanthology .org/N19-1357/ [58] S. W iegreffe and Y . Pinter , ‘ ‘ Attention is not not explanation, ’ ’ in Pr oceedings of the 2019 Confer ence on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Pr ocessing (EMNLP-IJCNLP) , K. Inui, J. Jiang, V . Ng, and X. W an, Eds. Hong Kong, China: Association for Computational Linguistics, Nov . 2019, pp. 11–20, arXiv:1908.04626. [Online]. A vailable: https://aclanthology .org/D19-1002/ [59] J. J. Grainger and W . D. Stevenson, P ower System Analysis , 1st ed. McGraw-Hill, Jan. 1994. [60] P . Kundur , P ower System Stability and Control , 1st ed. McGraw-Hill, Jan. 1994. CHI DOZIE EZEAKUN NE received the M.Sc. de- gree in Physics from the University of Central Florida in 2025, where he is currently pursuing the Ph.D. degree in Physics. He is a Graduate Intern with Los Alamos National Laboratory . His research interests include scientiﬁc machine learning, graph- based modeling, computational physics, and data- driv en methods for complex physical and engineer- ing systems. JOSE E. T ABAREZ receiv ed his Ph.D. from New Mexico State University in Electrical Engineering in 2019. At New Mexico State Univ ersity he was an EMUP and Sandia National Laboratories fellow . He is currently a research and development engineer at Los Alamos National Laboratory . His focus is on power system optimization and analysis, and mod- eling effects on power systems from geomagnetic disturbances. REEJ U POKHAR EL is a staff scientist in the Materi- als Science and T echnology Division at Los Alamos National Laboratory . She recei ved her Ph.D. in materials science from Carnegie Mellon Univ ersity in 2013. Her research focuses on data science for 3D imaging and integrating physics constraints into generativ e machine learning models to provide real- time feedback during dynamic imaging experiments at light sources. ANU P P ANDEY received his M.S. and Ph.D. de- grees in Physics from Ohio University , Athens, OH, in 2014 and 2017, respectively . He served as a post- doctoral research associate at Oak Ridge National Laboratory from 2017 to 2019. His research inter- ests focus on the application of artiﬁcial intelligence and physics-informed machine learning to critical infrastructure systems. 26 VOLUME 11, 2023

PowerModelsGAT-AI: Physics-Informed Graph Attention for Multi-System Power Flow with Continual Learning

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment