Active Constraint Learning in High Dimensions from Demonstrations
📝 Original Info
- Title: Active Constraint Learning in High Dimensions from Demonstrations
- ArXiv ID: 2512.22757
- Date: 2025-12-28
- Authors: Zheng Qiu, Chih-Yuan Chiu, Glen Chou
📝 Abstract
We present an iterative active constraint learning (ACL) algorithm, within the learning from demonstrations (LfD) paradigm, which intelligently solicits informative demonstration trajectories for inferring an unknown constraint in the demonstrator's environment. Our approach iteratively trains a Gaussian process (GP) on the available demonstration dataset to represent the unknown constraints, uses the resulting GP posterior to query start/goal states, and generates informative demonstrations which are added to the dataset. Across simulation and hardware experiments using high-dimensional nonlinear dynamics and unknown nonlinear constraints, our method outperforms a baseline, random-sampling based method at accurately performing constraint inference from an iteratively generated set of sparse but informative demonstrations.📄 Full Content
To overcome this limitation, we present an iterative active constraint learning algorithm built upon the Gaussian process (GP)-based constraint learning framework in Chou et al. (2022), which uses the Karush-Kuhn-Tucker (KKT) optimality conditions of the provided demonstrations to train nonparametric GP-based constraint representations. At each iteration, our algorithm trains a Gaussian Process (GP) on a given dataset of locally-optimal demonstrations to represent the unknown constraint as a GP posterior. Although active learning methods have been widely explored in the context of cost inference, to our knowledge, our work provides the first framework for active constraint inference over continuous state spaces. Our contributions are:
- Leveraging the GP-based constraint learning framework in Chou et al. (2022), we sample multiple GP posterior estimates of an unknown constraint given a demonstration dataset D.
The generation of multiple GP posterior samples provides a more complete description of the constraint information encoded in D, as well as the remaining constraint uncertainty that cannot be resolved using D.
-
We present a GP-based active constraint learning algorithm, GP-ACL (Alg. 1), which iteratively queries start/goal constraint states to induce the generation of informative demonstrations that reduce constraint uncertainty.
-
We evaluate our GP-ACL algorithm by inferring high-dimensional, nonlinear constraints from the generated demonstrations, both via 4D unicycle and 12D quadcopter dynamics in simulation, and via a 7-DOF robot arm hardware platform.
Related Works Existing LfD-based methods have enabled constraint learning via both IOC (Chou et al. (2020a(Chou et al. ( , 2022)); Armesto et al. (2017); Papadimitriou et al. (2022); Menner et al. (2021)) and inverse reinforcement learning frameworks (McPherson et al. (2021); Stocking et al. (2022); Papadimitriou and Brown (2024); Singh et al. (2018)). Specifically, Chou et al. (2020aChou et al. ( , 2022)); Papadimitriou and Li (2023) use KKT optimality conditions corresponding to the provided demonstrations to formulate inverse optimization problems, from which constraint information can then be extracted. In particular, our work builds upon and is most similar to Chou et al. (2022), which likewise embeds the KKT optimality conditions for the provided demonstrations into a Gaussian process (GP) framework to represent unknown constraints. However, whereas Chou et al. (2022) only computes the mean and covariance functions of a trained GP posterior for constraint inference, we additionally sample GP posterior samples as surrogate functions for the unknown constraint map, to facilitate the efficient generation of informative demonstrations. Moreover, while existing constraint learning methods can enable downstream planning that is robust to epistemic uncertainty (Chou et al., 2022(Chou et al., , 2021)), they do not actively seek to reduce uncertainty through soliciting informative demonstrations. In contrast, our GP-ACL algorithm explicitly actively generates demonstrations guided by the downstream constraint learning objective, a capability not considered in prior work.
Prior work has also considered the problem of active intent inference, with the aim of extracting the unknown reward or cost of an expert demonstrator from a provided set of trajectory demonstrations. In particular, methods have been developed to achieve active information gathering (Sadigh et al., 2016(Sadigh et al., , 2018;;Li et al., 2025b), intent demonstration (Li et al., 2024), and uncertainty reduction (Mesbah, 2018;Hu and Fisac, 2023) for human-robot interaction tasks. Meanwhile, Akrour et al. (2012); Fang et al. (2017); Lee et al. (2021) devise active reward learning methods for reinforcement learning (RL). Our work likewise considers the purposeful generation of demonstrations that are maximally informative with respect to a downstream inference task. However, unlike the works listed above, our GP-ACL algorithm leverages demonstration data to recover unknown constraints, rather than unknown intent, rewards, or costs.
Finally, our methods are related to recently developed RL-based (Papadimitriou et al. (2022)) and GP-based (Li et al. (2025a)) active constraint learning methods. Unlike these works, however, our method is capable of inferring unknown high-dimensional constraints in continuous and infinite state and constraint spaces and does not require both constraint-satisfying and constraint-violating behavior to guide learning, which is important for safety-critical robotics applications.
By a demonstration, we refer to a state-control trajectory ξ :
respectively denote the state system and control vector identified with the demonstration ξ at each time Chou et al. (2022), we assume that each demonstration is a locally-optimal solution to the constrained optimization problem described below (Prob. 1). First, let c : R (n+n i )T → R, g k : R (n+n i )T → R N ineq k , and h k : R (n+n i )T → R N eq k respectively encode a (possibly non-convex) cost function, as well as a set of known inequality and equality constraints. In our work, we focus on smoothness-based costs of the form c(ξ) := T -1 t=1 ∥x t+1x t ∥ 2 2 ; similar costs are often used in the constraint learning literature to encode trajectory length minimization (Chou et al. (2022); Papadimitriou and Li (2023)). Moreover, the known equality constraint h k (•) = 0 encodes a set of deterministic, nonlinear dynamics x t+1 = f t (x t , u t ), ∀ t ∈ [T ], as well as constraints on the initial and final system states, which we describe in more detail in Sec. 2.4.
Next, to formulate constraints unknown to the learner, let ϕ sep : R n → R nc denote a map from each system state x to a constraint state ϕ sep (x) ∈ R nc , at which the constraint satisfaction of the system state x is then evaluated 1 , and let ϕ : R (n+n i )T → R ncT be given by ϕ(ξ) := ϕ sep (x 1 (ξ)), • • • , ϕ sep (x T (ξ)) . For each t ∈ [T ], we formulate the unknown inequality constraints below, where
for each demonstration ξ ∈ R (n+n i )T . Finally, we write the demonstrator’s trajectory generation problem as follows.
Problem 1 (Demonstrator’s forward problem)
For each demonstration ξ, which forms a locally-optimal solution of Prob. 1, there must exist Lagrange multipliers λ k ∈ R N ineq k , λ ⌝k ∈ R T , and ν ∈ R N eq k such that the following KKT optimality conditions, denoted below by (ξ, λ k , λ ⌝k , ν) ∈ KKT, hold:
where ⊙ denotes element-wise multiplication, and for each differentiable map f : R n → R m , we define the gradient of f by
Above, (2a), (2b), (2c), and (2d) encode primal and dual feasibility, complementary slackness, and stationarity conditions, respectively. We denote the stationarity residual, i.e., the left-hand-side of (2d), by s(ξ, λ k , λ ⌝k , ν) ∈ R (n+n i )T , and the component of s(ξ, λ k , λ ⌝k , ν) corresponding to partial gradients with respect to system state x t (resp., control u t ) by . Constraint information can be extracted from each ξ d by examining its tight system states (if any), i.e., system states x t (ξ) at which g ⋆ ⌝k ϕ sep (x t (ξ)) = 0, as well as the constraint gradient values ∇ xt g ⋆ ⌝k (ϕ sep (x t (ξ))) at such tight system states. Methods for both extracting tight system states and computing the corresponding gradient values were first presented in Sec. IV-A in Chou et al. (2022); for completeness, we review details of these methods in App. A.1.
To learn a Gaussian Process (GP)-based representation for the unknown constraint function g ⋆ ⌝k (•), we collect the constraint states and estimated constraint gradients, across the tight timesteps t ∈ t tight (ξ d ) of each demonstration d ∈ [D], into the data sets D κ and D ∇ , as defined below. Here, for each d ∈ [D] and t ∈ [T ], w d,t ∈ R 1×n denotes an estimate for the unknown constraint gradient ∇ xt g ⋆ ⌝k (ϕ sep (x t (ξ d ))), taken with respect to the system state x t ; we defer a thorough discussion of the computation of w d,t to Prob. 8 in App. A.1:
We then define X := D κ and Y := (D g , D ∇ ) to respectively be the input and output training datasets for our GP model, set D := (X, Y), and optimize GP hyperparameters using the marginal log likelihood function (Rasmussen and Williams (2006)). Our GP training process parallels the approach in Sec. IV-B of Chou et al. (2022). The resulting posterior (g|D)(•) of the function g ⋆ ⌝k given dataset S D is defined by the posterior mean E[g(•)|D] and covariance cov[g(•)|D] (see (15) for details). A plausible constraint function ĝ(•) can be sampled from this posterior using a random Fourier feature-based approach. More details are provided in App. A.2.
Suppose we are given an existing demonstration dataset S D and a corresponding GP posterior (g|D)(•) describing our belief of the unknown, ground truth constraint g ⋆ ⌝k (•). We seek start (κ s ) and goal (κ g ) constraint states such that locally-optimal demonstrations ξ generated while adhering to the start and goal constraints ϕ sep (x 1 (ξ)) = κ s and ϕ sep (x T (ξ)) = κ g maximally reduce the remaining uncertainty over the GP posterior (g|D)(•). Concretely, we formulate our problem as follows:
Problem 2 (Active Constraint Learning (Idealized)) max κs,κg,ξ,λ k ,λ ⌝k ,ν
.
Above, given a candidate demonstration ξ, the objective (5a) evaluates the covariance of the GP posterior (g|D)(•) at the robustly identified tight states {x t (ξ) : t ∈ t tight (ξ)} of ξ. Since the covariance functions of GPs serve as uncertainty measures, (5a) captures the degree to which a candidate ξ traverses regions of high constraint uncertainty with respect to the GP posterior (g|D)(•).
Meanwhile, (5b) enforce the start/goal constraints and the local optimality of ξ (see Prob. 1).
A constraint learner who attempts to directly solve Prob. 2 faces several challenges. First, the KKT conditions KKT(S D ) appearing in (5b) depend on knowledge of the unknown constraint g ⋆ ⌝k (•) (see ( 2)), which the constraint learner a priori lacks. Moreover, for each candidate demonstration ξ, the tight timesteps t tight (ξ) appearing in (5a) can only be computed by solving Prob. 7 as an inner optimization loop, which renders Prob. 2 computationally intractable. To overcome these challenges, in Sec. 3, we decompose Prob. 2 into sub-problems that allow the recovery of start/goal constraint states which are approximately maximally-informative for the constraint learning task.
Below, Sec. 3.1 describes optimization routines for obtaining start/goal constraint states which induce approximately maximally informative demonstrations, to bypass the challenges of directly tackling Prob. 2, as identified in Sec. 2.4. The prescribed optimization steps are then synthesized in Sec. 3.2 into our iterative active constraint learning algorithm (Alg. 1), which repeatedly queries start and goal states from which the demonstrator is likely to generate state-control trajectories which maximally reduce the remaining uncertainty over the unknown constraints.
Suppose, as in Prob. 2, that we are given a demonstration set S D and corresponding GP posterior (g|D)(•) and P GP posterior samples {ĝ p (•) : p ∈ [P ]}. We aim to select suitable start/goal constraint states for constraint learning from each GP posterior sample ĝp (•). First, we compute the maximally informative constraint state κ MI,p , defined as the constraint state which maximizes the covariance of the GP posterior (g|D)(•) while remaining safe with respect to ĝp (•) (Prob. 3). We then approximate Prob. 2 as the problem of searching for start/goal constraint states, using each κ MI,p and corresponding gradient ∇ĝ p (κ MI,p ), which induce tight demonstrations against the unknown constraint at κ MI,p , and thus provide information about the constraint shape near κ MI,p . Concretely, we present two methods for computing start/goal constraint states, as codified in Probs. 4-5 and Prob. 6 below, which are tailored respectively to the settings in which the avoid set A := {κ ∈ R nc : g ⋆ ⌝k (κ) > 0} (i) is locally-convex near κ MI,p , or (ii) is not. Since the constraint learner lacks a priori knowledge of the geometry of A, our GP-ACL approach prescribes the application of both methods to generate two start/goal constraint state pairs.
Below, we introduce separate schemes for generating start/goal constraint states under the two distinct scenarios in which the avoid set A either is locally convex near κ MI,p (Fig. 2a) or is not (Fig. 2b).
Optimizing Start/Goal Constraint States in a Hyperplane H(η) Orthogonal to ∇ĝ p (κ MI,p ) We begin by formulating a search method for start/goal constraints that is adapted for the setting in which the avoid set A is locally-convex near κ MI,p , and thus the boundary of A curves away from κ MI,p in the direction of ∇ĝ p (κ MI,p ) (see Fig. 2a). Since we aim to induce demonstrations that are taut against the boundary of A (and thus reveal information about A) near κ MI,p , we wish to select start/goal constraint states from which the induced demonstration closely approximates the geometry of A’s boundary near κ MI,p . To this end, we first fix a constraint state κ p (η) that is slightly offset from κ MI,p in the direction of ∇ĝ p (κ MI,p ), with the degree of offset measured by the step size η. The purpose of defining a constraint state κ p (η) offset from κ MI,p is to increase the likelihood that the resulting demonstration is tight against the avoid set A at κ MI,p , i.e., the true constraint g ⋆ ⌝k is active at κ MI,p , for a constraint that is locally-convex around κ MI,p (see Fig. 2a). We then search for start/goal constraint states (κ s,p,⊥ , κ g,p,⊥ ) outside A but within a hyperplane H p (η) which is orthogonal to ∇ĝ p (κ MI,p ) and passes through κ p (η) (see Fig. 2a). As illustrated in Fig. 2a, the demonstration trajectory generated from (κ s,p,⊥ , κ g,p,⊥ ) is likely to curve away from κ MI,p in the direction of ∇ĝ p (κ MI,p ), similar to the boundary of A near ∇ĝ p (κ MI,p ), and yield a tight point at κ MI,p at which the true constraint g ⋆ ⌝k is active.
< l a t e x i t s h a 1 _ b a s e 6 4 = " 7 6 y 9 o 7 n u g Z q 2 + x g 5 t n 0
< l a t e x i t s h a 1 _ b a s e 6 4 = " + 7 C z y q i z X 5 h t
< l a t e x i t s h a 1 _ b a s e 6 4 = " N y Concretely, we first fix a suitably small step size η, and define:
To fix a search direction within H(η) starting from κ p (η), we aim to find a constraint state κp (η) in H(η) that is of maximum possible distance from κ p (η) while remaining unsafe with respect to the GP posterior sample ĝp (•) (Prob. 4). Intuitively, κp (η) is a constraint state on the boundary of A that lies on H p (η) which we aim to locate, to obtain a search direction (in the form of κ p (η)κp (η)) along which safe start/goal constraint states can be located.
Problem 4 (Identifying a Search Direction κp (η) -
We then search for start/goal constraint states along the line L(η) := {κ p (η)+t(κ p (η)-κ p (η)) : t ∈ R} on H(η) (Prob. 5). In effect, L(η) identifies, among all vectorial directions in H(η) starting from κ p (η), the direction along which constraint states remain unsafe for the longest distance away from κ p (η) (see Fig. 2c). We note that, from start/goal constraint states selected on L(η) outside of the avoid set A, the demonstrator is likely to generate trajectories that are taut against A, in order to minimize path length while ensuring feasibility.
Concretely, for each p ∈ [P ], we search for start/goal constraint states on L(η) that are as close to κ p (η) as possible while remaining safe. To encode safety, we consider constraint states κ that are (i) strictly safe with respect to ĝp (•) by a margin δ > 0, and (ii) safe with respect to the GP posterior (g|D)(•) with probability at least β > 0, where both δ and β are algorithm design parameters. Mathematically, the two safety conditions described above are given by:
where Φ(•) is the cumulative distribution function (CDF) of the unit Gaussian distribution. We formulate Prob. 5 below to compute optimal start and goal constraint states, denoted respectively by κ s,p,⊥ and κ g,p,⊥ , along different segments of L(η) corresponding to the selection of negative or positive values of the scale parameter τ , which measures distance from the unsafe constraint state κ p (η). We choose τ 2 as our objective in (11a) to compel the start/goal constraint states to be close to κ p (η) while remaining probabilistically safe in the sense of (10), to increase the likelihood of generating demonstrations that are close to, and tight against, the avoid set A (see Fig. 2a).
Problem 5
Remark 1 While numerically solving Prob. 5, we often encode the constraints (10) as penalties in the cost (11a) with large weights, to bypass the issue that initializations of κ may fail to satisfy (10).
Optimizing Start/Goal Constraint States Along ∇ĝ p (κ MI,p ) If the true constraint set were in fact not locally convex near κ MI,p , the boundary of A may curve away from κ MI,p in the direction of -∇ĝ p (κ M I,p ) (see Fig. 2b). In this case, start/goal constraints located by searching along ±∇ĝ p (κ MI,p ) from κ MI,p , which we denote by (κ s,p,∥ , κ g,p,∥ ), are likely to generate demonstrations that are tight against A, since such demonstrations may also exhibit curvature away from κ MI,p in the direction of -∇ĝ p (κ M I,p ), similar to the boundary of A. An illustration is provided in Fig. 2b, and the computation of (κ s,p,∥ , κ g,p,∥ ) is described in detail in Prob. 6. We note that Probs. 3-6 are efficiently solvable via the IPOPT solver (Wächter and Biegler, 2006) in Casadi (Andersson et al., 2019). Given that A may or may not be locally-convex near κ MI,p , searching for start/goal constraint states in directions both orthogonal and parallel to the gradient ∇ĝ p (κ MI,p ) (Probs. 4-6) increases the likelihood of generating demonstrations that are tight against the unknown constraint. Concretely, Prob. 6 considers a variant of Prob. 5 which searches for start/goal states in the direction of ∇ĝ p (κ MI,p ) from κ p (η), rather than along the direction κp (η)κ p (η), with the same aim of inducing tight demonstrations that are taut against the constraint boundary (see Fig. 2b).
We present our algorithm for active constraint learning (GP-ACL) in Alg. 1. Concretely, given an available demonstration dataset S (i)
we solve Prob. 7 to extract tight points and perform GP learning via Prob. 8 to construct the stochastic GP posterior (g (i) |D)(•) as an estimate of the unknown constraint function g ⋆ ⌝k (•) (Lines 2-3). From (g (i) |D)(•), we compute the posterior mean E[g (i) (•)|D] and randomly draw posterior samples {ĝ (i) p (•) : p ∈ [P ]}. Next, by solving Prob. 4-6, we query start and goal constraint states that are likely to compel the demonstrator to produce constraint-revealing trajectories (Lines 4-7). Finally, after new demonstrations are generated (see Prob. 1), we insert the newly generated demonstrations and their corresponding robustly identified constraint state and gradient values (see Sec. 2.3) into S , and proceed to the next iteration of our algorithm (Lines 8-10).
To evaluate our GP-ACL algorithm, we perform constraint learning tasks on simulations using double integrator, 4D unicycle, 12D quadcopter, and 7-DOF robot arm dynamics, and on hardware platforms using a 7-DOF robot arm. Below, Sec. 4.1 introduces parameter settings shared across our experiments, while Sec. 4.2 presents a select subset of our experiment results. For additional experiments, see App. B.
Our experiments involve the dynamics, constraints, and costs listed below. Given a state vector x t , we use p t ∈ R 3 and p x,t , p y,t , and p z,t ∈ R to respectively denote the overall 3D position vector and the x-, y-, and zposition coordinates encoded by x.
Algorithm 1: Gaussian Process-based Active Constraint Learning (GP-ACL) Algorithm. κ p (η), H(η) ← Compute offset constraint state and hyperplane via ( 7) and (8) 6:
κ,2 , solve Prob. 1 while enforcing 9:
s,j and ϕ sep (x T (ξ
D 11: end for Dynamics models We infer constraints from demonstrations generated using 2D and 3D doubleintegrator, 4D unicycle, 12D quadcopter (Sabatino (2015)), and 7 DOF robot arm (Murray et al. (1994)) dynamics. In our experiments, we discretize the above continuous-time dynamics at intervals of ∆t = 1 and set a time horizon of T = 30, unless stated otherwise.
Constraints We consider eight types of unknown nonlinear constraints {g ⋆ ⌝k,i : i ∈ [8]}, each defining a corresponding obstacle set A i that demonstrations must avoid. We define the constraint space for each obstacle set to be either the configuration space of the robot arm, or in the 2D/3D Cartesian coordinate system for all other dynamics. For the mathematical definition of each constraint type, see Appendix B.1. In addition to the nonlinear constraints mentioned above, each demonstration trajectory must satisfy start/goal constraints, as described in Secs. 2-3.
Costs All demonstrations in the following experiments are generated via the smoothness cost c(x) := T -1 t=1 ∥p t+1p t ∥ 2 2 , which compels each demonstration to minimize the total distance traversed between the prescribed start/goal constraints while avoiding the prescribed obstacle sets.
Algorithm Implementation When running our GP-ACL algorithm (Alg. 1), unless otherwise specified, we set α = 0.3 and use n ℓ = 1000 random Fourier basis functions to train the GP posterior for constraint representation ((16) and Alg. 1, Lines 2-3), with N iters = 3 and δ = 0.001. We select β = 0.55 to compute the start/goal constraint states (κ s,p,⊥ , κ g,p,⊥ ) in Alg. 1, Line 6, and β = 0.3 and β = 0.55 to compute κ s,p,∥ and κ g,p,∥ , respectively, on Line 7.
To evaluate the constraint recovery accuracy of our GP-ACL (resp., a random-sampling baseline) method, we randomly generate n s samples in the constraint space and report the fraction γ ours (resp., γ BL ) of sampled constraint states at which constraint satisfaction or violation was accurately predicted. For both methods, we also visualize constraint states corresponding to false positive (FP) and false negative (FN) errors, defined respectively as incidents in which safe constraint states are mistakenly marked as unsafe, and vice versa.
Unicycle Simulations We evaluate our GP-ACL algorithm by recovering the three complex nonlinear constraints g ⋆ ⌝k,1 , g ⋆ ⌝k,2 , and g ⋆ ⌝k,3 , as defined in App. B.1 and visualized in Fig. 3. Here, we set α = 0.3 when sampling GP posteriors. Across n s = 2500 sampled constraint states, our GP-ACL algorithm accurately predicts the safeness or unsafeness of each sample with accuracy γ ours = 0.9996, 1.0, and 0.9956 for the constraints g ⋆ ⌝k,1 , g ⋆ ⌝k,2 and g ⋆ ⌝k,3 , respectively, while the random sampling-based baseline method yielded accuracies of only γ BL = 0.9788, 0.9912, and 0.9492, respectively. In particular, although both the baseline method and our approach accurately classified most of the space within each obstacle set, our method incurred significantly lower misclassification rates near obstacle boundaries, resulting in higher constraint representation quality (Fig. 3). Overall, our numerical results illustrate that our GP-ACL algorithm outperforms the random sampling baseline in recovering a priori unknown constraint sets with complex boundaries.
Quadcopter Simulations Our GP-ACL algorithm also outperforms the baseline method in accurately recovering constraints from demonstrations of length T = 20 generated from high-dimensional quadcopter dynamics and the hourglass-shaped constraint g ⋆ ⌝k,5 (Fig. 4). Here, we set α = 0.3 when sampling GP posteriors. Over n s = 27, 000 sampled constraint states, our GP-ACL algorithm
Recovery accuracy in Dim (4, 5, 6) (ACL)
Recovery accuracy in Dim (4, 5, 6) (Baseline)
Figure 4: Our GP-ACL algorithm (left half) outperforms the random sampling baseline (right half) in accurately recovering constraints g ⋆ ⌝k,3 (top), g ⋆ ⌝k,7 (middle/bottom), from unicycle dynamics (top) and simulated 7-DOF arm (middle/bottom) demonstrations, with fewer false positive (green) and false negative (red) errors. Middle and bottom row figures display 3D slices of the 7D constraint space from our numerical simulations on the 7-DOF arm. achieved a higher accuracy rate (γ ours = 0.9919) compared to the baseline method (γ BL = 0.9423). Our experiment results verify that, compared to the baseline approach, our GP-ACL algorithm achieve superior constraint recovery accuracy when learning from demonstrations generated using high-dimensional nonlinear dynamics.
7-DOF Arm Simulations and Hardware Experiments Across constraint recovery tasks involving demonstrations of length T = 20 generated in simulation (resp., on hardware) using 7-DOF arm dynamics and the ellipse-shaped constraint g ⋆ ⌝k,7 (resp., using the physical obstacle visualized in Fig. 1), our GP-AL algorithm likewise achieves higher constraint inference accuracy compared to the baseline method. (We do not report γ ours and γ BL here, due to challenges inherent in sampling from a 7D constraint space.) For 7-DOF robot arm simulations and hardware experiments, we use δ = 0.1 in our GP-ACL algorithm. Moreover, for hardware experiments, we set α = 0.1 when sampling GP posteriors. Our experiment results illustrate that, when learning from either simulated or hardware demonstrations generated using high-dimensional robot arm dynamics, our GP-ACL algorithm achieves superior constraint recovery accuracy compared to the baseline approach.
Constraint Accuracy of GP-ACL vs. Baseline In Fig. 5 we plot constraint learning accuracy as a function of iteration count for both our GP-ACL method and the random-sampling baseline approach, when learning from demonstrations generated using unicycle, quadcopter, and 3D double integrator dynamics. Overall, our GP-ACL algorithm consistently achieves higher per-iterate constraint learning accuracy compared to the baseline sampling approach. Figure 5: Constraint recovery accuracy of (a) our GP-ACL algorithm and (b) the random-sampling baseline approach. Across the following constraint learning tasks, our GP-ACL method consistently recovered the a priori unknown constraints more accurately compared to the random-sampling baseline method: Learning the constraints g ⋆ ⌝k,1 , g ⋆ ⌝k,2 , and g ⋆ ⌝k,3 from demonstrations generated using unicycle dynamics (blue, red, and yellow, respectively); learning the constraint g ⋆ ⌝k,5 from quadcopter dynamics (purple); and learning the constraint g ⋆ ⌝k,6 from double integrator dynamics (green).
This paper presents our Active Constraint Learning (ACL) method, which efficiently infers unknown constraints through iterative, uncertainty-guided demonstration generation. Across simulation and hardware experiments encompassing nonlinear, high-dimensional robot dynamics and non-convex, high-dimensional constraints, our ACL method successfully queries a small number of informative demonstrations to efficiently and accurately recover unknown constraints. In contrast to existing constraint inference techniques which learn from demonstrations generated without considerations of constraint uncertainty, our ACL method achieves higher constraint inference accuracy while learning from a substantially smaller, but more informative, demonstration dataset.
as priors in regression tasks, in which one aims to infer an unknown map g : R n → R from an input-output dataset D := ({(x i , y i ) ∈ R n × R m }) N d i=1 generated via a noisy output model y i ∼ N (g(x i ), σ 2 ). In such scenarios, the posterior (g|D)(•) of the function g is characterized by the following posterior mean and covariance maps:
Whereas Chou et al. (2022) only uses the posterior mean for constraint inference, we additionally sample from the posterior (g|D)(•) to facilitate higher constraint learning efficiency. Concretely, we sample posterior functions {ĝ p (•) : p ∈ [P ]} using the Random Fourier Features-based approach presented in Wilson et al. (2020). Concretely, we sample n ℓ basis functions ϕ ℓ : R nc → R, as described in Wilson et al. (2020), and select a scale coefficient α > 0, which modulates the level of random deviations between E[g(•)|D] and each posterior function ĝp (•). We then draw w p,ℓ ∼ i.i.d. N (0, 1), and define, for each p ∈ P :
to form the desired GP posterior samples.
Appendix B. Supplementary Material for Sec. 4
We concretely formulate the constraints g ⋆ ⌝k,1 , • • • , g ⋆ ⌝k,7 and associated avoid sets A 1 , • • • , A 7 below. (Note that A 8 is a physical obstacle in a hardware experiment). Recall that, by definition, A i := {κ ∈ R nc : g ⋆ ⌝k,i (κ) ≤ 0} for each i ∈ [8]. As formulated in Sec. 4, given a state x, we refer to the position, x-coordinate position, y-coordinate position, and z-coordinate position by p, p x , p y , and p z , respectively.
For g ⋆ ⌝k,1 and A 1 , we set: g ⋆ ⌝k,1 (ϕ sep (x)) = -0.02|1.732p x + p y | 1.4 -0.042|p x -1.732p y | 1.4 + 1 (17)
For g ⋆ ⌝k,2 and A 2 , we set:
where:
where: For g ⋆ ⌝k,7 and A 7 , we set:
where B := diag{150, 90, 150, 90, 150, 90, 150} ∈ R 7×7 . Here, diag{•} describes a diagonal matrix, with the given entries indicating diagonal values.
We evaluate our GP-ACL algorithm on the problem of recovering the three nonlinear constraints g ⋆ ⌝k,1 , g ⋆ ⌝k,2 , g ⋆ ⌝k,3 , and g ⋆ ⌝k,4 , as defined in B.1 and visualized in Figs. 6,7,8,and 9. Here, we set α = 0.3 when sampling GP posteriors for learning g ⋆ ⌝k,3 , and use the default setting of α = 0.15 while learning g ⋆ ⌝k,1 , g ⋆ ⌝k,2 , g ⋆ ⌝k,3 , and g ⋆ ⌝k,4 . All other parameters are set at the default values provided in Sec. 4.1. Across n s = 2500 sampled constraint states, our GP-ACL algorithm accurately predicts the safeness or unsafeness of each sample with accuracy γ ours = 0.9952, 0.996, 0.9956, and 0.9762 for the constraints g ⋆ ⌝k,1 , g ⋆ ⌝k,2 , g ⋆ ⌝k,3 , and g ⋆ ⌝k,4 , respectively, while the random sampling-based baseline method yielded accuracies of only γ BL = 0.99, 0.9868, 0.9652, and 0.9692, respectively.
© 2026 Z. Qiu, C.-Y. Chiu & G. Chou.
Our formulation readily generalizes to settings in which some unknown constraints depend on control inputs.