Electric Arc Furnaces Scheduling under Electricity Price Volatility with Reinforcement Learning
This paper proposes a reinforcement learning-based framework for optimizing the operation of electric arc furnaces (EAFs) under volatile electricity prices. We formulate the deterministic version of the EAF scheduling problem into a mixed-integer linear programming (MILP) formulation, and then develop a Q-learning algorithm to perform real-time control of multiple EAF units under real-time price volatility and shared feeding capacity constraints. We design a custom reward function for the Q-learning algorithm to smooth the start-up penalties of the EAFs. Using real data from EAF designs and electricity prices in New York State, we benchmark our algorithm against a baseline rule-based controller and a MILP benchmark, assuming perfect price forecasts. The results show that our reinforcement learning algorithm achieves around 90% of the profit compared to the perfect MILP benchmark in various single-unit and multi-unit cases under a non-anticipatory control setting.
💡 Research Summary
This paper presents a novel reinforcement learning (RL) framework to address the complex scheduling problem of Electric Arc Furnaces (EAFs) in volatile electricity markets. EAFs, crucial for scrap-based steelmaking, are large and flexible electricity consumers whose profitability is highly sensitive to real-time electricity prices. The core challenge lies in making real-time operational decisions (start-up, melting, tapping) for single or multiple furnaces under shared power capacity constraints, without the benefit of perfect future price knowledge.
The authors first establish a deterministic Mixed-Integer Linear Programming (MILP) model that captures the essential physics and economics of EAF operation. The model abstracts the batch process into a high-power “melting” stage and a low-power “other stages” mode, incorporating start-up penalties, material inventory dynamics, and system-wide power caps. This MILP, solved with perfect price foresight, serves as an upper-bound benchmark for optimal profit.
Acknowledging the computational intractability of repeatedly solving this MILP online and the unreality of perfect forecasts, the paper proposes two practical approaches. The first is a Rolling-Horizon MILP, which solves the optimization over a short, moving time window, implementing only the immediate decisions. While more practical, it still requires an optimization solver at each step.
The primary contribution is the development of a Q-learning algorithm for direct, solver-free real-time control. The RL agent observes the system state (time, furnace stock levels, operational status, and recent electricity prices) and outputs actions for each furnace. A key innovation is the design of a custom reward function that “smoothes” the sharp cost penalty associated with furnace start-ups, guiding the agent to learn schedules that balance immediate costs with long-term gains.
The methodology is rigorously evaluated using real-world data from EAF designs and one year of historical day-ahead and real-time electricity prices from the New York ISO (NYISO). Performance is benchmarked against a simple rule-based controller and the perfect-foresight MILP. In both single-unit and multi-unit (three furnaces) case studies under a non-anticipatory control setting, the RL algorithm consistently achieves approximately 90% of the profit obtained by the perfect-information MILP benchmark, significantly outperforming the rule-based controller.
The results demonstrate that the learned RL policy can effectively mimic near-optimal scheduling behavior despite price uncertainty and without requiring intensive online computations. This work highlights the significant potential of reinforcement learning as a scalable, adaptive, and computationally lightweight solution for real-time industrial process optimization, particularly in environments dominated by price volatility and complex operational constraints.
Comments & Academic Discussion
Loading comments...
Leave a Comment