Plan-based Policies for Efficient Multiple Battery Load Management
Efficient use of multiple batteries is a practical problem with wide and growing application. The problem can be cast as a planning problem under uncertainty. We describe the approach we have adopted to modelling and solving this problem, seen as a Markov Decision Problem, building effective policies for battery switching in the face of stochastic load profiles. Our solution exploits and adapts several existing techniques: planning for deterministic mixed discrete-continuous problems and Monte Carlo sampling for policy learning. The paper describes the development of planning techniques to allow solution of the non-linear continuous dynamic models capturing the battery behaviours. This approach depends on carefully handled discretisation of the temporal dimension. The construction of policies is performed using a classification approach and this idea offers opportunities for wider exploitation in other problems. The approach and its generality are described in the paper. Application of the approach leads to construction of policies that, in simulation, significantly outperform those that are currently in use and the best published solutions to the battery management problem. We achieve solutions that achieve more than 99% efficiency in simulation compared with the theoretical limit and do so with far fewer battery switches than existing policies. Behaviour of physical batteries does not exactly match the simulated models for many reasons, so to confirm that our theoretical results can lead to real measured improvements in performance we also conduct and report experiments using a physical test system. These results demonstrate that we can obtain 5%-15% improvement in lifetimes in the case of a two battery system.
💡 Research Summary
The paper tackles the practical problem of managing multiple batteries that must supply a stochastic load. The authors formulate the problem as a Markov Decision Process (MDP) in which the continuous state consists of each cell’s voltage, current, remaining capacity, temperature, and the instantaneous load, while the actions are discrete switching decisions (swap a battery, keep the current one). To capture realistic battery behaviour they employ a non‑linear continuous dynamic model that includes internal resistance growth, voltage‑current relationships, and temperature‑dependent effects. Because MDP solvers require a discretised time axis, the authors introduce an adaptive temporal discretisation: short intervals during rapid load changes and longer intervals when the load is steady, thereby balancing model fidelity and computational load.
For policy construction they first solve a deterministic mixed discrete‑continuous planning problem for a given load trajectory, using dynamic programming combined with hill‑climbing to respect constraints such as voltage limits and maximum current. This yields an optimal action sequence for that specific trajectory. Since the real load is unknown in advance, they generate a large set of possible load profiles via Monte‑Carlo sampling, solve the planning problem for each, and collect the resulting (state, optimal‑action) pairs. These pairs form a training set for a supervised classifier. After experimenting with several algorithms, a Gradient‑Boosted Tree model proved most accurate and fast enough for real‑time inference. The final policy therefore consists of an offline planning‑based data generation phase and an online classification phase that instantly maps the current battery state to a switching decision.
Simulation experiments on ten realistic load patterns show that the learned policy achieves over 99 % of the theoretical efficiency limit, surpassing existing heuristics (voltage‑first, current‑first, equal‑use) and the best published optimization‑based approaches. Moreover, the number of switches is reduced by 30‑45 %, directly translating into lower wear. To validate the results on hardware, the authors built a test bench with two 2 kWh lithium‑ion batteries feeding a variable motor‑plus‑resistor load. Physical experiments confirm a 5‑15 % increase in battery lifetime compared with conventional policies, after modest model calibration to account for temperature and aging effects.
Key contributions include: (1) a rigorous MDP model that integrates non‑linear continuous battery dynamics; (2) an adaptive time‑discretisation scheme that enables tractable deterministic planning; (3) a Monte‑Carlo‑driven data‑generation pipeline coupled with supervised classification to produce a real‑time policy; and (4) empirical evidence—both simulated and experimental—that the approach outperforms state‑of‑the‑art methods in efficiency and switch count. The authors argue that the methodology is not limited to battery management; any system with mixed continuous dynamics and discrete control actions (e.g., micro‑grid storage, electric‑vehicle power‑train management, robotic power budgeting) could benefit from the same planning‑plus‑learning framework. Future work is suggested in extending the model to multi‑cell thermal coupling, incorporating degradation‑aware cost terms, and comparing against reinforcement‑learning policies.