Contraction Metric Based Safe Reinforcement Learning Force Control for a Hydraulic Actuator with Real-World Training
Force control in hydraulic actuators is notoriously difficult due to strong nonlinearities, uncertainties, and the high risks associated with unsafe exploration during learning. This paper investigates safe reinforcement learning (RL) for hy draulic force control with real-world training using contraction metric certificates. A data-driven model of a hydraulic actuator, identified from experimental data, is employed for simulation based pretraining of a Soft Actor-Critic (SAC) policy that adapts the PI gains of a feedback-linearization (FL) controller. To reduce instability during online training, we propose a quadratic-programming (QP) contraction filter that leverages a learned contraction metric to enforce approximate exponential convergence of trajectories, applying minimal corrections to the policy output. The approach is validated on a hydraulic test bench, where the RL controller is trained directly on hardware and benchmarked against a simulation-trained agent and a fixed-gain baseline. Experimental results show that real-hardware training improves force-tracking performance compared to both alternatives, while the contraction filter mitigates chattering and instabilities. These findings suggest that contraction-based certificates can enable safe RL in high force hydraulic systems, though robustness at extreme operating conditions remains a challenge.
💡 Research Summary
This paper addresses the challenging problem of force control in hydraulic actuators, which are characterized by strong nonlinearities, parameter uncertainties, and high risk of unsafe exploration during learning. The authors propose a comprehensive framework that combines data‑driven modeling, reinforcement learning (RL) for gain adaptation, and a contraction‑metric‑based safety filter to enable safe real‑world training on a hydraulic test bench.
First, a high‑fidelity dynamics model is learned from experimental data using a multilayer perceptron (MLP) with two hidden layers of 32 neurons each and ReLU activation. The network is trained with a multi‑step prediction horizon (H = 70) to capture long‑term dynamics, achieving normalized root‑mean‑square errors on the order of 10⁻⁴, which is more than two orders of magnitude better than the analytic model derived from first principles. The state vector includes hydraulic force, its derivative, load‑cell force, piston position, pressures in chambers A and B, and valve current.
Second, a baseline feedback‑linearization (FL) controller is implemented based on the nominal hydraulic model. The FL law computes a valve current command from the desired force reference, a proportional gain Kp, and an integral gain Ki. Because the real system deviates from the nominal model through unknown scaling factors (C₁, C₂) and disturbances (d), exact cancellation of nonlinearities is impossible, leading to performance sensitivity to the choice of Kp and Ki.
Third, the authors employ a Soft Actor‑Critic (SAC) algorithm to adapt the PI gains online. The policy πθ receives the current state and the desired force reference and outputs Kp and Ki. The policy is first pretrained in simulation using the learned MLP model, then fine‑tuned directly on hardware. This two‑stage approach mitigates the sim‑to‑real gap while still allowing the policy to benefit from real‑world data.
The core safety contribution is a contraction‑metric‑based filter. A state‑dependent positive‑definite matrix M(x) is learned such that the differential Lyapunov function V = δxᵀMδx satisfies 𝑑V/𝑑t ≤ −2λV for a chosen contraction rate λ > 0. By projecting the differential state onto the force‑error dimension only (δx =
Comments & Academic Discussion
Loading comments...
Leave a Comment