Provably Safe and Robust Learning-Based Model Predictive Control

Provably Safe and Robust Learning-Based Model Predictive Control
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Controller design faces a trade-off between robustness and performance, and the reliability of linear controllers has caused many practitioners to focus on the former. However, there is renewed interest in improving system performance to deal with growing energy constraints. This paper describes a learning-based model predictive control (LBMPC) scheme that provides deterministic guarantees on robustness, while statistical identification tools are used to identify richer models of the system in order to improve performance; the benefits of this framework are that it handles state and input constraints, optimizes system performance with respect to a cost function, and can be designed to use a wide variety of parametric or nonparametric statistical tools. The main insight of LBMPC is that safety and performance can be decoupled under reasonable conditions in an optimization framework by maintaining two models of the system. The first is an approximate model with bounds on its uncertainty, and the second model is updated by statistical methods. LBMPC improves performance by choosing inputs that minimize a cost subject to the learned dynamics, and it ensures safety and robustness by checking whether these same inputs keep the approximate model stable when it is subject to uncertainty. Furthermore, we show that if the system is sufficiently excited, then the LBMPC control action probabilistically converges to that of an MPC computed using the true dynamics.


💡 Research Summary

The paper introduces a novel control framework called Learning‑Based Model Predictive Control (LBMPC) that simultaneously guarantees deterministic robustness and leverages data‑driven learning to improve performance. Traditional Model Predictive Control (MPC) relies on a fixed model; when the model is inaccurate, the controller must be conservative to preserve safety, sacrificing performance. Adaptive and learning‑based controllers, on the other hand, can refine the model online but typically lack hard guarantees on constraint satisfaction and stability. LBMPC resolves this tension by maintaining two parallel models of the plant.

The first model is a nominal linear system with an additive bounded disturbance (d_k) that captures the worst‑case modeling error. This model is used exclusively for safety verification. By employing tube‑MPC techniques, a feedback gain (K) is designed so that, for any admissible disturbance, the true state remains inside a tube around the nominal trajectory. The tube width at step (i) is (R_i = \bigoplus_{j=0}^{i-1}(A+BK)^jW), where (W) is the polytope containing the disturbance. Constraints on the state and input are tightened to (X\ominus R_i) and (U\ominus KR_i) respectively, guaranteeing that the actual system never violates the original constraints. A terminal invariant set (\Omega) (maximal output‑admissible disturbance‑invariant set) is constructed to ensure recursive feasibility and asymptotic stability.

The second model is a learned representation of the plant, denoted by an “oracle” (O_k(\tilde x_k,\tilde u_k)). This oracle can be any statistical estimator—parametric regression, Gaussian processes, neural networks, etc.—that provides the value (and optionally the gradient) of the unknown dynamics at queried points. The learned dynamics are expressed as (\tilde x_{k+1}=A\tilde x_k+B\tilde u_k+O_k(\tilde x_k,\tilde u_k)). Importantly, the oracle is allowed to be time‑varying, reflecting continual updates as more data become available.

LBMPC’s optimization problem minimizes a cost (\psi_k) (e.g., quadratic stage and terminal costs) that depends only on the learned model’s predicted states and inputs. The same control sequence is applied to both the nominal and learned models. Because safety constraints are enforced on the nominal model, the controller remains robust even if the learned model is temporarily inaccurate. Conversely, performance is driven by the learned model: as the oracle converges toward the true nonlinear term (g(x,u)), the optimal control approaches that of an MPC that knows the exact dynamics.

The authors prove two central results. First, under the standard tube‑MPC assumptions (stabilizable ((A,B)), a suitable feedback gain (K), and a properly constructed terminal set (\Omega)), any feasible solution at time (k) generates a feasible solution at time (k+1). Hence recursive feasibility, constraint satisfaction, and closed‑loop stability are guaranteed deterministically, independent of the learning process. Second, assuming persistent excitation of the plant, the oracle’s estimation error converges in probability to zero. Consequently, the LBMPC control law converges in probability to the optimal control law of the true‑dynamics MPC. This “probabilistic convergence” bridges the gap between statistical learning (which yields stochastic guarantees) and robust control (which demands deterministic guarantees).

Experimental validation is performed on three hardware testbeds—a robotic arm, a vehicle steering platform, and a simulated jet‑engine compressor—and on a high‑fidelity nonlinear simulation. In all cases, LBMPC respects state and input limits while achieving markedly lower tracking error and reduced energy consumption compared with a conventional linear MPC that does not incorporate learning. The jet‑engine example especially showcases the ability of the learned model to capture strong nonlinearities, expanding the admissible operating envelope without sacrificing safety.

In summary, the paper makes four key contributions: (1) a dual‑model architecture that cleanly separates safety (nominal model) from performance (learned model); (2) a rigorous application of tube‑MPC to obtain deterministic robustness despite model uncertainty; (3) a flexible integration of any statistical learning method as an oracle, enabling continual model improvement; and (4) a theoretical proof that, under sufficient excitation, the learning‑based controller converges to the optimal controller for the true system. This work offers a practical pathway for deploying high‑performance, safety‑critical controllers in energy‑constrained and safety‑sensitive domains such as autonomous vehicles, aerospace systems, and smart grids.


Comments & Academic Discussion

Loading comments...

Leave a Comment