Learning Constraints from Stochastic Partially-Observed Closed-Loop Demonstrations
We present a method for learning unknown parametric constraints from locally-optimal input-output trajectory data. We assume the data is generated by rollouts of stochastic nonlinear dynamics, under a single state or output feedback law and initial condition but distinct noise realizations, to robustly satisfy underlying constraints despite worst-case noise outcomes. We encode the Karush-Kuhn-Tucker (KKT) conditions of this robust optimal feedback control problem within a feasibility problem to recover constraints consistent with the local optimality of the demonstrations. We prove that our constraint learning method (i) accurately recovers the demonstrator’s policy, and (ii) conservatively estimates the set of policies that ensure constraint satisfaction despite worst-case noise realizations. Moreover, we perform sensitivity analysis, proving that when demonstrations are corrupted by transmission error, the inaccuracy in the learned feedback law scales linearly in the error magnitude. Empirically, our method accurately recovers unknown constraints from simulated noisy, closed-loop demonstrations generated using dynamics, both linear and nonlinear, (e.g., unicycle and quadrotor) and a range of feedback mechanisms.
💡 Research Summary
The paper tackles the problem of inferring unknown parametric safety constraints from demonstration data that are generated by a stochastic, partially‑observed closed‑loop system. Unlike prior work that assumes deterministic dynamics, full state observation, or open‑loop trajectories, this study assumes a single output‑feedback controller operating under linear‑time‑varying (or locally linearized) dynamics with additive process noise and measurement noise. Demonstrations are locally optimal in the sense that they solve a robust optimal control problem: the controller must satisfy both known constraints and an unknown constraint set for all possible realizations of the bounded disturbances.
The authors first reformulate the robust optimal control problem using the Karush‑Kuhn‑Tucker (KKT) conditions. Because the demonstrations are optimal, they must satisfy the KKT conditions of the underlying problem. By encoding these conditions as a feasibility problem, one can recover any constraint parameters that are consistent with the observed optimality. However, the presence of output feedback makes the direct use of KKT cumbersome. To address this, the paper adopts the System‑Level Synthesis (SLS) framework, which parameterizes the entire closed‑loop behavior through a system response matrix Φ = {Φ_xw, Φ_xe, Φ_uw, Φ_ue}. This representation captures how process and measurement disturbances propagate to states and inputs under the feedback law, while preserving a linear relationship between Φ and the feedback gain K. The SLS constraints (13) guarantee that any lower‑block‑triangular Φ corresponds to a causal output‑feedback controller.
With Φ in hand, the worst‑case violation of each constraint can be expressed as a maximization over the disturbance sets, yielding robustified constraint functions ˜g_k and ˜g_⊥k. The robust optimal control problem then reduces to a tractable convex program (14) over the nominal trajectory (z, v) and Φ. The demonstrator solves this program, generating a nominal trajectory and a feedback gain K* (via (12)). Multiple noisy rollouts of the closed‑loop system are then collected, possibly corrupted by transmission errors δ_u and δ_y, forming the dataset ˜D.
The learning pipeline consists of three stages:
-
Feedback Gain Recovery – By stacking differences of successive input and output trajectories, a linear least‑squares problem (23) yields an estimate ˜K of the true gain K*. The authors prove that if the stacked output matrix ˜Y has full row rank (Assumption 1), then ˜K = K* exactly when there is no transmission error.
-
Nominal Trajectory Recovery – Using the estimated gain, the nominal state and input sequences (z*, v*) are recovered by solving a linear system (24) that averages over all demonstrations, thereby mitigating the effect of transmission errors.
-
Constraint Parameter Identification – With η = (z, v, Φ) estimated, the KKT conditions of the original robust program are written explicitly (25). The set F(η) of all θ that admit KKT multipliers satisfying these conditions is computed; any θ ∈ F(η) is consistent with the observed optimality. The true parameter θ* is guaranteed to belong to this set.
Theoretical contributions include:
- Invertibility Lemma (Lemma 1) showing that the matrix Γ =
Comments & Academic Discussion
Loading comments...
Leave a Comment