Fine-Tuning LLMs to Generate Economical and Reliable Actions for the Power Grid

Reading time: 5 minute
...

📝 Original Info

  • Title: Fine-Tuning LLMs to Generate Economical and Reliable Actions for the Power Grid
  • ArXiv ID: 2602.15350
  • Date: 2026-02-17
  • Authors: ** 논문에 명시된 저자 정보가 제공되지 않았습니다. (정보 없음) **

📝 Abstract

Public Safety Power Shutoffs (PSPS) force rapid topology changes that can render standard operating points infeasible, requiring operators to quickly identify corrective transmission switching actions that reduce load shedding while maintaining acceptable voltage behavior. We present a verifiable, multi-stage adaptation pipeline that fine-tunes an instruction-tuned large language model (LLM) to generate \emph{open-only} corrective switching plans from compact PSPS scenario summaries under an explicit switching budget. First, supervised fine-tuning distills a DC-OPF MILP oracle into a constrained action grammar that enables reliable parsing and feasibility checks. Second, direct preference optimization refines the policy using AC-evaluated preference pairs ranked by a voltage-penalty metric, injecting voltage-awareness beyond DC imitation. Finally, best-of-$N$ selection provides an inference-time addition by choosing the best feasible candidate under the target metric. On IEEE 118-bus PSPS scenarios, fine-tuning substantially improves DC objective values versus zero-shot generation, reduces AC power-flow failure from 50\% to single digits, and improves voltage-penalty outcomes on the common-success set. Code and data-generation scripts are released to support reproducibility.

💡 Deep Analysis

📄 Full Content

Large language models (LLMs) have rapidly transitioned from research prototypes to deployable decision-support tools across diverse domains [1]- [3]. Their ability to transform unstructured descriptions into structured outputs makes them attractive for operational environments where decisions are time-sensitive and consequences are high. Power system control rooms are a compelling setting: operators manage complex contingencies, coordinate actions across many assets, and balance reliability, economics, and compliance under tight time constraints [4]. Unlike traditional decision-support tools that require specialized inputs or rigid interfaces, LLMs enable operators to interact through natural language while producing machine-readable recommendations (e.g., structured action lists) that can be verified before execution [5], [6].

However, foundation LLMs lack domain-specific knowledge of power system physics, operational constraints, and grid safety requirements. Training grid-specific LLMs from scratch is impractical: modern LLMs succeed through pretraining on trillions of tokens spanning diverse domains [7], while grid operations data are orders of magnitude smaller and specialized. A practical alternative is to adapt a strong instruction-tuned model via targeted fine-tuning so that it can (i) read a compact, structured description of a grid scenario and (ii) output actions in a constrained grammar that can be checked for feasibility.

In this work, we study a concrete and operationally motivated task: corrective, open-only transmission switching during Public Safety Power Shutoffs (PSPS), which are corrective deenergization actions used by utilities to reduce wildfire ignition risk during extreme weather conditions [8]. When PSPS forces lines out of service, operators must rapidly determine whether opening additional elements can mitigate overloads, reduce load shedding, and improve operating conditions while respecting switching budgets and operational rules. Computing optimal actions with mixed-integer optimization can be expensive under time pressure, especially when considering nonlinear AC constraints [9]- [11]. Our goal is to amortize this optimization effort into training, then produce high-quality switching recommendations at inference time using structured scenario summaries and a verifiable action grammar.

Figure 1 illustrates the pipeline we adopt. Starting from an instruction-tuned base model, supervised fine-tuning (SFT) trains the LLM to imitate MILP-derived open-only switching decisions under DC constraints. We then apply direct preference optimization (DPO) using ranked responses derived from AC voltage-quality evaluation, producing a voltage-aware policy that more reliably prioritizes actions with fewer voltage violations.

This design follows a standard alignment pattern for instruction-tuned LLMs: imitation learning first, followed by preference-based refinement [12], [13]. In our setting, the supervised stage anchors the policy to an optimization oracle, while the preference stage injects AC voltage-awareness that is difficult to encode directly in DC training. The resulting model functions as a candidate-plan generator whose outputs can be parsed, verified, and evaluated with existing grid-analysis tools. Our contributions are:

• We formulate PSPS-aware open-only switching with switching budgets and corridor structure using a DC-OPF MILP oracle (Section II).

• We design a structured scenario representation and action grammar that enables an instruction-tuned LLM to emit switching plans that are straightforward to parse and verify (Section III).

• We introduce a voltage-aware preference refinement stage based on DPO, using AC-derived voltage-quality preferences to align the model beyond DC imitation (Section III-A).

• We evaluate economic performance, AC feasibility, and voltage quality, including comparisons to a neural baseline and training-curve reporting for reproducibility (Section IV).

Finally, we discuss practical considerations such as feasibility checks, training/inference costs, and deployment constraints (Section IV). We view this as a step toward verifiable, operator-facing LLM assistants that interface with existing grid analysis pipelines rather than replacing them.

Public Safety Power Shutoffs (PSPS) are preventive and corrective de-energization actions taken by utilities to reduce wildfire ignition risk during extreme weather conditions [14], [15]. When a PSPS event forces a subset of transmission lines out of service, system operators must determine whether additional corrective open-only switching actions can improve reliability and reduce load shedding. We formulate a DC optimal power flow (DC-OPF) model that explicitly incorporates

Constraint (1g) enforces that PSPS-forced outages remain open; (1h) limits operator-induced opens to K ℓ available lines. This is structurally related to optimal transmission switching [9], [10] but restricted to open-only actions

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut