Robots executing iterative tasks in complex, uncertain environments require control strategies that balance robustness, safety, and high performance. This paper introduces a safe information-theoretic learning model predictive control (SIT-LMPC) algorithm for iterative tasks. Specifically, we design an iterative control framework based on an information-theoretic model predictive control algorithm to address a constrained infinite-horizon optimal control problem for discrete-time nonlinear stochastic systems. An adaptive penalty method is developed to ensure safety while balancing optimality. Trajectories from previous iterations are utilized to learn a value function using normalizing flows, which enables richer uncertainty modeling compared to Gaussian priors. SIT-LMPC is designed for highly parallel execution on graphics processing units, allowing efficient real-time optimization. Benchmark simulations and hardware experiments demonstrate that SIT-LMPC iteratively improves system performance while robustly satisfying system constraints.
I TERATIVE tasks are ubiquitous in robotics, spanning ap- plications such as quadrotor racing [1], quadruped motion planning [2], and surgical robotics [3]. The core challenge in these settings lies in improving performance over successive executions, leveraging data from prior attempts to refine future behavior [4]. To achieve this, a variety of approaches have been explored, including deep reinforcement learning (RL) [5], genetic algorithms [6], and optimal control [2,7]. These iterative tasks usually involve additional complexities, such as navigating dynamic environments with obstacles [8] or ensuring safe human-robot interaction [9]. Thus, balancing performance with safety through constraint satisfaction during training is a key requirement.
Iterative learning control (ILC) improves system performance by using error information from previous task executions to refine future control signals [10]. Learning model predictive control (LMPC) [11] is a reference-free variant of ILC that iteratively constructs a controlled invariant terminal constraint set (safe set) and a terminal cost function within a model predictive control (MPC) framework, optimizing for solutions to constrained infinite-horizon optimal control problems over successive iterations. LMPC ensures safety by enforcing state constraints throughout the MPC horizon and enforcing the terminal state to reside within the safe set. The control method converges asymptotically to the optimal controller for deterministic linear systems with quadratic costs [12]. For stochastic systems, [13] specializes LMPC to linear systems with state noise by constructing robust safe sets from previous trajectories and learning a terminal cost function representing the value function associated with the control policies used for collecting data. These safe sets and terminal cost function can be approximated using a finite number of prior trajectories while ensuring that the worst-case iteration cost is non-increasing [14]. Recently, in adjustable boundary condition (ABC)-LMPC [15,16], the LMPC framework is extended to stochastic nonlinear dynamical systems by using a sampling-based cross-entropy method (CEM) MPC to repeatedly sample trajectories until all samples satisfy the terminal set constraint and select the least-cost sampled trajectory [17]. Although ABC-LMPC theoretically extends the LMPC framework to stochastic nonlinear dynamical systems, the state constraints are encoded with a high constant cost in the cost function, which yields overly conservative control. Furthermore, it has been reported that the CEM sampling leads to infeasible solutions for stochastic systems and is susceptible to mode collapse in high-dimensional nonlinear systems [18].
Information-theoretic MPC or model predictive path integral control (MPPI) [19] is a sampling-based MPC algorithm for stochastic systems. MPPI synthesizes the optimal control by minimizing the Kullback-Leibler (KL) divergence between the optimal control distribution and the sampled control distribution [19]. Stochasticity is handled by sampling trajectories and optimizing over their expected cost without requiring gradient information. The sampling process can be parallelized on a graphics processing unit (GPU), enabling real-time control. Previous work has demonstrated that MPPI outperforms CEM-MPC in terms of safety and cost [20]. However, MPPI is an unconstrained optimal control method, and state constraints are typically incorporated through clamping in the dynamics or a high cost on violations [20]. Alternatively, state constraints can be satisfied by projecting unsafe sampled trajectories onto a feasible set for differentially flat systems [21] or by incorporating a control barrier function (CBF) into the cost function combined with a gradient-based local repair step [22]. These approaches can ensure safety, but rely on special assumptions about model dynamics or the existence of a valid CBF, which is hard to realize for real-world databased scenarios. To the best of our knowledge, there is no constrained MPPI that can handle general state constraints.
The main contribution of this paper is the development of a safe iterative learning control framework for general stochastic nonlinear systems by extending the LMPC formulation and solving the resulting optimization problem by designing a constrained information-theoretic MPC algorithm. Our proposed approach is general and does not rely on assumptions about the system’s dynamics and state constraints. To efficiently and effectively balance optimality and safety, we develop an online sampling-based adaptive penalty method. We learn the value function using normalizing flows by leveraging trajectories from previous iterations, enabling richer uncertainty modeling than Gaussian priors. We provide a fully parallelized deployment of our method, enabling 100Hz+ real-time control on a scaled off-road vehicle with an NVIDIA Jetson Orin AGX. The architecture of the pr
This content is AI-processed based on open access ArXiv data.