Scalable Data-Driven Reachability Analysis and Control via Koopman Operators with Conformal Coverage Guarantees

📝 Abstract

We propose a scalable reachability-based framework for probabilistic, data-driven safety verification of unknown nonlinear dynamics. We use Koopman theory with a neural network (NN) lifting function to learn an approximate linear representation of the dynamics and design linear controllers in this space to enable closed-loop tracking of a reference trajectory distribution. Closed-loop reachable sets are efficiently computed in the lifted space and mapped back to the original state space via NN verification tools. To capture model mismatch between the Koopman dynamics and the true system, we apply conformal prediction to produce statistically-valid error bounds that inflate the reachable sets to ensure the true trajectories are contained with a user-specified probability. These bounds generalize across references, enabling reuse without recomputation. Results on highdimensional MuJoCo tasks (11D Hopper, 28D Swimmer) and 12D quadcopters show improved reachable set coverage rate, computational efficiency, and conservativeness over existing methods.

💡 Deep Analysis

📄 Full Content

Computing reachable sets, i.e., the set of states reachable over a time horizon, is key to robot safety. However, reachability analysis for robots is difficult as they often (i) lack analytical models, (ii) have high-dimensional nonlinear dynamics, and (iii) require reasoning over long horizons. Even if analytical dynamics are available, challenges (ii) and (iii) cause nonlinear reachability tools (Bansal et al., 2017;Althoff, 2015) to become prohibitive or to suffer from excessive overapproximation over long horizons (Rober and How, 2024). Data-driven reachability (Park et al., 2024;Bak et al., 2025) can be used without analytical models, but can be data-inefficient or lack guarantees that the estimated sets contain all possible trajectories. In contrast, linear reachability scales to long horizons and high dimensions (Bak et al., 2019). The Koopman framework learns linear dynamics in a lifted state space, e.g., via neural networks (NNs), to approximate the nonlinear system (Shi et al., 2024). While computing reachable sets for the Koopman linearization can be more scalable than using the original nonlinear dynamics, existing methods only provide guarantees for the approximate lifted dynamics. Thus, safety may not hold for the true system (Bak et al., 2021(Bak et al., , 2025)). Moreover, existing Koopman reachability methods analyze a single dynamical system. However, robots often require multiple controllers to complete different tasks, which forces existing methods to recompute reachable sets from scratch for each new controller, which is inefficient. Reusing previously computed information would enable more efficient reachability analysis across controllers.

To close these gaps in scalability, speed, and reusability, we propose ScaRe-Kro (Scalable Reachability via Koopman Operators), which uses Koopman theory with an NN lifting function to perform reachability analysis and control. Unlike prior Koopman reachability methods for autonomous systems, we also design a linear controller in this lifted space to track a distribution of reference trajectories, each completing a different task. ScaRe-Kro efficiently computes closed-loop reachable sets under the induced linear tracking error dynamics in the lifted Koopman space and maps them to the original state space via NN verification tools. To account for Koopman model error, we apply conformal prediction (CP) to derive statistically-valid error bounds. By inflating the mapped reachable sets with these CP bounds, they are guaranteed to contain the true system trajectories with a user-specified probability (e.g., 97.5%). Moreover, as the CP bounds are calibrated across the trajectory distribution, they can be reused across different references and thus closed-loop systems, unlike prior Koopman approaches that certify only a single autonomous system. Overall, using linear reachability and CP, our method enables scalable data-driven probabilistic verification and control, improving both efficiency and conservativeness over prior work. Our contributions are:

A scalable Koopman-based reachability framework for unknown nonlinear dynamics, using NN lifting functions and NN verification to estimate reachable sets in the original state space. 2. An approach for inflating the resulting Koopman reachable sets using CP, to obtain probabilistic guarantees of overapproximation for the true dynamics. 3. Control design in the lifted space for closed-loop tracking of a reference trajectory distribution, with CP-based inflation bounds calibrated to be reusable across the distribution. 4. Extensive evaluation, showing reliable verification for high-dimensional robotic systems (up to 28D), outperforming baselines in safety rate, computation speed, and conservativeness.

Reachability via Linear Relaxation While exact reachable set computation for nonlinear systems is typically intractable, reachable set overapproximations (RSOAs) can often be efficiently computed to certify safety (Althoff et al., 2021). When the closed-loop system involves NN components (e.g., learned dynamics or controller), RSOAs can be computed via linear relaxation-based NN verification (NNV) tools (e.g., CROWN (Zhang et al., 2018)) (Zhang et al., 2018;Everett et al., 2021;Jafarpour et al., 2024). Here, conservativeness grows with the number of nonlinearities (Nath et al., 2025) and can be excessive for nonlinear robot dynamics. Symbolic and one-shot methods mitigate this but raise runtime and memory costs (Chen et al., 2023;Rober and How, 2024;Everett et al., 2021). We instead model the robot with Koopman-linearized dynamics, reducing the number of nonlinearities in the CG and improving scalability (i.e., computation time, conservativeness).

Koopman Reachability Koopman (Koopman and Neumann, 1932) and Carleman (Amini et al., 2021;Bayer and Leine, 2023) linearizations accelerate reachability but introduce approximation error. Moreover, while error bounds exist, these derivations are limited to quadratic or polynomial dynamics (Liu et al., 2021;Forets and Pouly, 2017;Amini et al., 2021), which do not apply to many robot dynamics models. Prior Koopman reachability methods also lack reachable set coverage calibration (Thapliyal and Hwang, 2022) or guarantee containment only for a finite set of observed trajectories (Kochdumper and Bak, 2022), failing to ensure safety for previously-unseen trajectories of the true system. Prior methods also do not support NN liftings and consider autonomous systems, which corresponds to fixing a single controller. Changing controllers, as often done in robotics, requires repeating the calibration (Forets and Schilling, 2021;Bak et al., 2021Bak et al., , 2025)), which is inefficient. In contrast, we support NN liftings, provide probabilistic coverage guarantees for the true dynamics, and integrate control design, enabling data-efficient reachability across multiple controllers.

Data-Driven Reachability Data-driven reachability uses black-box trajectory data. Some methods adapt Hamilton-Jacobi analysis (Chilakamarri et al., 2024), but this is costly for high-dimensional systems. Others learn reachability functions (Sun and Mitra, 2022) but require dense data to accurately generalize, or yield ellipsoidal RSOAs (Park et al., 2024) assuming privileged system data (e.g., Lipschitz constants). Sampling-based methods (Lew et al., 2022) are limited to short horizons. Scenario optimization provides probabilistic guarantees (Dietrich et al., 2024) but is slower than Koopman-based methods (Bak et al., 2025). Moreover, these methods assume a fixed controller, yet robots often switch controllers to achieve different goals, altering the closed-loop dynamics and forcing reachable sets to be recomputed. Trajectory-tracking reachability provides reusable tracking-error sets across reference trajectories (Herbert et al., 2017;Singh et al., 2018;Knuth et al., 2021). Data-driven variants learn contraction metrics (Chou et al., 2021(Chou et al., , 2022;;Knuth et al., 2023) for tracking-based reachability of unknown systems, but finding such metrics is generally difficult. We build on this work, yielding Koopman reachable sets that are fast to compute, scalable to long horizons, and reusable for various closed-loop dynamics induced by tracking different trajectories.

We consider unknown, black-box nonlinear systems, either in open-loop (1a) or closed-loop (1b):

, and feedback control law π : X × T → U . The initial state x 0 lies in set X 0 ⊆ X . Denote Int(a, a) = {a | a ≤ a ≤ a}, as an interval, where a, a ∈ R A and inequalities hold element-wise, i.e.,

We use ⊕ and ⊖ to denote the Minkowski sum and difference, respectively.

As our work relies on Koopman operators, we discuss the basic concepts here. Consider autonomous discrete-time nonlinear dynamics x t+1 = f (x t ). A Koopman operator K is an infinite-dimensional linear operator that evolves the observables ϕ ∞ of the state Kϕ

, where • denotes function composition. As infinitedimensional Koopman operators are intractable to obtain in practice, we define an approximation to K using a finite-dimensional lifting function ϕ : X → Z ⊆ R l and matrix K A ∈ R l×l , which define lifted dynamics z t+1 = K A z t , where z t = ϕ(x t ) is the lifted state and z t ∈ Z ⊆ R l . Here, l can be arbitrarily selected by the choice of the lifting function ϕ. For systems with control input (1a), the lifted dynamics can be similarly written as

In practice, ϕ, K A , and K B can be learned from data, with the lifting function g parameterized using polynomials or NNs (Shi et al., 2024). In this work, we utilize an NN-based autoencoder architecture, where the encoder ϕ : X → Z defines the lifting function and the decoder ψ : Z → X defines its learned inverse, i.e., ψ(ϕ(x)) ≈ x; if ψ is a perfect inverse, ψ(ϕ(x)) = x and X = X .

Reachability via NN Verification (NNV) We use NNV to compute reachable sets for closedloop systems involving NN-based lifting and inverse functions. NNV tools represent an NN as a computational graph (CG) -a directed acyclic graph encoding the sequence of operations applied to an input (Rober and How, 2024). For some CG G with input set S ⊆ R n i and output G(S) ⊆ R no , we can compute a guaranteed overapproximation of its image G(S) using CROWN-based (Zhang et al., 2018) tools like auto LiRPA (Xu et al., 2020). These tools provide guaranteed affine lower and upper bounds, G and G, on the output G(S) for any interval S, formalized in this proposition:

Proposition 1 (CG Robustness (Xu et al., 2020)) For some CG G and interval S .

Given controller π, dynamics (1a), and initial set X 0 , the T -timestep exact reachable set is denoted as X 0:T

While exact reachability analysis is generally intractable for nonlinear systems (Althoff et al., 2021), it is often feasible to compute a reachable set overapproximation (RSOA) X 0:T , which satisfies X t ⊆ X t for all t ∈ {0, . . . , T }. To obtain X 0:T , Prop. 1 can be applied to the CG

, with auto LiRPA computing the RSOA. Conformal Prediction (CP) Our RSOA computations also rely on CP. In this work, we adopt split CP, which partitions i.i.d. input-output pairs (v (i) , y

into a size-L training set D T and a size-K calibration set D C . For a given prediction function µ : V → Y, we compute a nonconformity score R (i) . = s(y (i) , µ(v (i) )) for each of the K calibration pairs in D C , where s : Y × Y → R ≥0 is a chosen nonconformity score function. As an example, we can express the ℓ 2 prediction error of a learned dynamics model f in this framework by defining

which satisfies the marginal coverage guarantee P(y (0) ∈ Y(v (0) )) ≥ 1δ for an unseen data-point (y (0) , v (0) ) that is exchangeable with D (Lindemann et al., 2024).

We are given black-box access to f (1a), i.e., we do not know its analytical form and its output can only be estimated via data. We are a planner P : X → X T +1 × U T that generates open-loop reference trajectories (x ref 0:T , u ref 0:T -1 ) that satisfy (1a), starting from an initial condition

that is uniformly distributed over X 0 ⊖B ϵ (0). We assume nothing further about P ; it can be a learned or traditional planner. At runtime, the true initial state x 0 may be perturbed from x ref 0 , i.e., x 0 ∈ B ϵ (x ref 0 ), where x 0 is uniformly distributed over B ϵ (x ref 0 ). To be robust to this error, we design a controller π : X ×T → U that tracks references from P and compute high-probability RSOAs X 0:T for the resulting closed-loop dynamics (1b), such that P T t=0 (x t ∈ X t ) ≥ 1-δ. This process should scale to high-dimensional nonlinear systems over long horizons (e.g., n ≥ 25, T ≥ 200). We solve: ), the closedloop trajectory remains in X 0:T with probability at least 1δ, i.e., P T t=0 (x t ∈ X t ) ≥ 1δ.

Our method (Fig. 2, Alg. 1) learns a Koopman operator (Sec. 5.1), uses it for control design (Sec. 5.2), performs fast linear reachability analysis on the closed-loop lifted Koopman dynamics (Sec. 5.3), and inflates these reachable sets via CP to obtain probabilistic coverage guarantees (Sec. 5.4).

To solve Prob. 1, we train a Koopman operator model (ϕ, ψ, K A , K B ) on a dataset D . = {(x

of open-loop trajectories from a planner P (details in Sec. 6). We use a composite loss

2 ). The first term, L (i)

))∥ 2 , enforces autoencoder accuracy via state reconstruction error and a latent consistency loss. The second term,

, is a multi-step (horizon H) dynamics loss penalizing prediction errors from the linear model (K A , K B ) in both latent and state space.The latent consistency term is crucial for long-horizon prediction accuracy. We initialize K A as identity and K B with Xavier uniform, train via learning rate annealing and weight decay, and set (ϕ, ψ) to have ReLU or GELU activations.

To reliably imitate state/control reference trajectories generated by the planner P in the lifted state space Z, a tracking controller π is required to stabilize against initial condition perturbations x 0x ref 0 and to reject disturbances due to model error in the learned lifted dynamics z t+1 = K A z t + K B u t . To obtain π, we use the linearity of the lifted dynamics to design a linear quadratic regulator (LQR) that tracks trajectories in Z. Given a reference state-space trajectory x ref 0:T produced by P , we map it to the lifted state space as z ref

The corresponding feedforward control trajectory u ref 0:T -1 which minimizes the lifted-state imitation error is then computed as:

where K † B is the pseudo-inverse of K B . The error states δz t = z tz ref t and controls δu t = u tu ref t follow linear error dynamics δz t+1 = K A δz t + K B δu t , starting from an initial error δz 0 = ϕ(x 0 )z ref 0 , where x 0 ∈ B ϵ (x ref 0 ). We solve an LQR problem, finding the feedback controls δu 0:T -1 that minimize the finite-horizon quadratic cost δz ⊤ T Q T δz T + T -1 t=0 (δz ⊤ t Qδz t +δu ⊤ t Rδu t ), subject to the linear dynamics δz t+1 = K A δz t + K B δu t , where Q ≻ 0, R ≻ 0, and Q T ≻ 0. In particular, we solve a backward Riccati recursion to obtain the optimal feedback gains G 0:T -1 . This yields the tracking control law in the lifted state space (3), which can be applied on the nonlinear system (1a) as (4) by measuring and encoding the state x t :

(4)

The high computational cost of nonlinear reachability motivates our use of Koopman in the lifted state space using the feedback control law (4). Specifically, the TKFL Γ : X 0 × U T × Z T +1 → X T +1 maps the initial state x 0 ∈ B ϵ (x ref 0 ), the feedforward control trajectory u ref 0:T -1 (2), and the lifted reference z ref 0:T to the decoded trajectory x0:T . = {ψ(z t )} T t=0 representing the predicted closed-loop trajectory (5) in the original state space, where the lifted states z t are updated via the closed-loop lifted dynamics (6):

To summarize, the CG is constructed as follows (see Fig. 2 for visualization). First, we map the initial state via ϕ into the latent space, computing z 0 = ϕ(x 0 ). Second, we unroll the T -fold recursive application of the single-step latent dynamics, z t+1 = K A z t + K B u t , where u t is the closedloop tracking control computed using (3), to obtain the latent trajectory z 0:T = {z 0 , z 2 , • • • , z T }. Third, z 0:T is mapped back to the original state space via the decoder ψ to obtain x0:T . The CG Γ is passed to the auto LiRPA library (Xu et al., 2020), which enables set-based bound propagation through Γ using CROWN (Zhang et al., 2018). auto LiRPA computes vector-valued affine bounding functions, Γ and Γ, which enables us to compute an RSOA X 0:T for the decoded closedloop lifted dynamics (5) by considering specific components of Γ, Γ t and Γ t , that bound the states

x Z E j e 2 S f H B K H n J M S u S F l U i W M P J J n 8 k r e j C f j x X g 3 P i a t c 0 Y 2 s 0 P + w P j 8 A e L B l g w = < / l a t e x i t >

u L p H 5 a 9 i 7 K 5 3 d n p c r 1 L I 4 C O S R H 5 I R 4 5 J J U y C 2 p k h p h Z E C e y S t 5 s 3 L r x X q 3 P q a t S 9 Z s 5 o D 8 g f X 5 A 3 S 2 l D U = < / l a t e x i t >

x j e T v z m A 1 O a R 7 I G o 5 j 5 I e l L 3 u O U g J E 6 9 g F g j 0 v s p U 4 J e 6 I b g S 7 h 2 q n r j T t 2 0 S k 7 U + B F 4 m a k i D J U O / a X 1 4 1 o E j I J V B C t 2 6 4 T g 5 8 S B Z w

L F / c n x c r N 1 k c e X S I j t A J c t E V q q A 7 V E V 1 R N E j e k a v 6 M 1 6 s l 6 s d + t j 1 p q z s p l 9 9 A f W 5 w / S R 5 U 4 < / l a t e x i t > t 2 {0, . . . , T 1}

< l a t e x i t s h a 1 _ b a s e 6 4 = " U y 9 5 a L N A

L F / c n x c r N 1 k c e X S I j t A J c t E V q q A 7 V E V 1 R N E j e k a v 6 M 1 6 s l 6 s d + t j 1 p q z s p l 9 9 A f W 5 w / S R 5 U 4 < / l a t e x i t > t 2 {0, . . . , T 1}

T times, T times, < l a t e x i t s h a 1 _ b a s e 6 4 = " 4 / z m t 1 w q a 4 9 o Q U l g 7 L j g j G L y reached under (5) at timestep t, starting from any initial condition x 0 ∈ X 0 . This resulting hyperrectangular set X 0:T , which we refer to as the Koopman reachable set (KRS), is found by solving the optimization ( 7), which is computed efficiently by auto LiRPA:

The KRS X 0:T overapproximates x0:T so that xt ∈ X t , for all t ∈ T , ensuring the decoded closedloop lifted dynamics (5) remain in X 0:T . Formally, we state the following result (proof in App. 8.1):

Lemma 1 (KRS overapproximation) Let Γ denote the closed-loop CG of TKFL, and let X 0 be the initial set. Suppose the KRS X 0:T is defined as in (7). Then, for any initial state x 0 ∈ X 0 , the decoded closed-loop trajectory (5), i.e., x0:T = Γ(x 0 , z ref 0:T , u ref 0:T -1 ) satisfies xt ∈ Xt for all t ∈ T .

Crucially, the KRS overapproximates the reachable set for the decoded Koopman linearized dynamics (5), not the true nonlinear dynamics (1b), due to Koopman model error. This motivates Sec. 5.4.

We now explain how to inflate the KRS to guarantee it contains the true system (1b) trajectories with a user-specified probability 1δ. We use CP to compute a high-probability upper bound on the state-space prediction error of the decoded closed-loop lifted dynamics (5). Specifically, we adapt the single nonconformity score approach (SNSA) (Lindemann et al., 2024) to compute stateand dimension-dependent error bounds using a calibration dataset 8) and decoding to the state space via ψ and 2) executing the Koopman controller on the true nonlinear dynamics (9):

The prediction error for each state dimension j ∈ {1, . . . , n} at time t is denoted e

t,j . This gives a calibration set of error trajectories

where each e

t ∈ R n . Each error trajectory e (i) is normalized into a single non-conformity score R (i) = max t,j (λ t,j |e (i) t,j |), which is the maximum weighted error over all time steps t ∈ {0, . . . , T } and state dimensions j ∈ {1, . . . , n}. Here, the normalization constant λ t,j . = 1/(e max t,j + σ) ∈ R >0 is computed using a separate normalization dataset D N . This dataset, D N , consists of M λ additional error trajectories, {e (i) } K cal +M λ i=K cal +1 , collected using the same process as the calibration data. Here, σ is a small positive constant to prevent division by zero. The term e max t,j is the dimension-wise and time-wise maximum absolute error observed across D N , i.e., e max t,j = max i |e (i) t,j | where i ∈ {K cal + 1, . . . , K cal + M }. Now, given a user-defined miscoverage rate δ ∈ (0, 1), we compute the (1δ) quantile of the non-conformity scores C = Quantile 1-δ ({R (i) } K cal i=1 ∪ {∞}) by finding the p-th smallest value of the scores {R (i) } K cal i=1 , where p ≡ ⌈(K cal + 1)(1δ)⌉. The final, time-and dimension-dependent error bound ēt,j ∈ R ≥0 is obtained by reversing the normalization ēt,j = C/λ t,j . These bounds ensure that for an unseen test data-point (x (0) 0:T , x(0) 0:T ), the true prediction error |e (0) t,j | will be contained within the bound ēt,j for all t and j, with probability at least 1δ, i.e.,

We use these bounds to inflate the KRS computed in (7) to compute RSOAs for the original closedloop dynamics (1b) that are valid with probability at least 1δ, using (11) (proof in App. 8.1):

Theorem 1 (CKRS Coverage Guarantee) Let X 0:T be the KRS defined in (7) and let ēt = [ē t,1 , . . . , ēt,n ] ⊤ ∈ R n . Define the conformalized Koopman reachable set (CKRS) as

where diag(ē t ) ∈ R n×n is a diagonal matrix with ēt on its diagonal. Then, for a new reference trajectory x ref,(0) 0:T drawn from the same distribution, the CKRS contains the true closed-loop system trajectory generated by (1b) with probability at least 1δ, i.e.,

Offline Calibration As discussed so far, the method in Sec. 5.4 is reference-specific, as the calibration trajectories D C and error bounds ēt hold specifically for a single reference x ref 0:T . We propose an offline approach that pre-computes a single, global set of error bounds ēt,j valid for any reference x ref 0:T sampled from the planner P . This is done by building a larger, more diverse calibration set. We first sample N ref reference trajectories {x ref,(i)

For each reference, we generate a calibration trajectory by sampling x

), yielding an error trajectory e

x(i) t are as defined in ( 8) and ( 9), respectively. The SNSA procedure (computing R (i) , C, and ēt,j via λ t,j ) is then performed exactly once on this entire aggregated set.

At runtime, given a new reference x ref,(0) 0:T sampled from P , two efficient computations are performed: 1) we compute the nominal KRS X 0:T specific to x ref,(0) 0:T via ( 7)), and 2) we inflate this set using the pre-computed offline bounds ēt via the Minkowski sum X system safety. In contrast, our method attains 100% empirical coverage across all tested calibration set sizes, avoiding this variance-data trade-off and achieving target coverage with far fewer samples. Pure NNV Reachability. On a unicycle model, we compare with our prior work (Nath et al., 2025), which used NNV tools for one-shot RSOA computation, requiring 51.95s and achieving an average log-volume of -3.98. In contrast, our method computes the RSOA in 0.32s with a log-volume of -11.54 (Table 1). This highlights that our method is much faster and also reduces conservativeness.

Unicycle We evaluate on CKRS computations on a unicycle model ( 12), where ϕ and ψ have 3 hidden layers with 128 neurons each, and z ∈ R 10 . As shown in Fig. 7, the certified reachable sets verify both safety and goal attainment. Full metrics are in Table 1. We view this benchmark as a sanity check on a simple system, paving the way for more substantive scalability results.

Planar Quadcopter We evaluate a 6-state planar quadcopter (13) navigating around a circular obstacle for T = 100. Here, ϕ and ψ have 3 hidden layers with 128 neurons each, and z ∈ R 24 . We visualize offline-calibrated CKRSs for multiple reference trajectories in Fig. 8; metrics are in Table 1. An NNV-only baseline (Nath et al., 2025) required 59.85 s and achieved an average logvolume of -18.709. Overall, these results support our claim of reduced RSOA conservativeness.

3D Quadcopter We evaluate an analytical 12-state quadcopter (14) (Sabatino, 2015) navigating around a 1 m radius obstacle for T = 200. Here, ϕ and ψ have 3 hidden layers with 128 neurons each, and z ∈ R 24 . The computed CKRS (Fig. 4) verifies safety and goal reaching; metrics are in Table 1. To show that the offline CP inflation generalizes across controllers, we compute the CKRS for two references (Fig. 4, one orange, one blue) using the same offline-calibrated bounds; we visualize additional CKRSs in Fig. 9. An NNV-only baseline (Nath et al., 2025) is much slower and more conservative, requiring 2153.45 s with an average log-volume of -30.32. Overall, these results show our method’s efficiency, tightness, and offline CP generalization for high-dimensional systems. MuJoCo: Hopper We evaluate on MuJoCo’s Hopper-v5 (Todorov et al., 2012), a nonlinear, underactuated system with black-box dynamics. A PPO agent (Stable-Baselines3, default hyperparameters) trained for 2×10 6 steps is used as the planner P that generates trajectories for the training dataset. Here, ϕ and ψ have 3 hidden layers with 128 neurons each, and z ∈ R 24 . Numerical results are in Table 1 and RSOA slices for a few state dimensions are shown in Fig. 1 (see Fig. 11 for all 11 dimensions). This experiment highlights that our approach scales to black-box, high-dimensional contact-rich dynamics while maintaining low runtimes and probabilistic coverage guarantees.

MuJoCo: Customized Swimmer We further assess scalability on a customized MuJoCo Swimmer-v5 (Todorov et al., 2012) with extended links, yielding a 28-dimensional state. The planner P is a PPO agent (Stable-Baselines3 MLP) trained for 1 × 10 7 steps with modified hyperparameters (LR 1 × 10 -3 , γ = 0.9999, batch size 512; others default). Here, ϕ and ψ have 3 hidden layers with 128 neurons each, and z ∈ R 38 . Numerical results are in Table 1 and RSOA slices for a few state dimensions are shown in Fig. 1 (see Fig. 12 for all 28 dimensions). This experiment indicates that our approach maintains certified probabilistic safety while tractably computed RSOAs for high-dimensional systems over long horizons for black-box dynamical systems in MuJoCo.

We present ScaRe-Kro, a method for scalable reachability analysis of complex, nonlinear black-box dynamics that provides formal coverage guarantees. Our approach learns a lifted linear representation of the nonlinear dynamics through data, which facilitates tracking control design and the efficient propagation of closed-loop reachable sets. We then “conformalize” this reachable set by inflating it such that rollouts of the target nonlinear system are guaranteed to stay within the inflated set with a specified probability (i.e., at least 1δ). Across several benchmarks, including a unicycle, planar/3D quadrotors, MuJoCo hopper, and a 13-link MuJoCo swimmer, our method computes reachable sets efficiently and over long time horizons. Future work aims to: 1) remove the dependence on reference trajectories and 2) replace LQR with a robust control policy that satisfies constraints despite model error to increase the time horizon over which the system can be verified.

We provide proofs for our theoretical results (App. 8.1), an algorithm block detailing our method (App. 8.2), and additional experiments and experimental details (App. 8.3).

Proof The closed-loop graph Γ is a computational graph G as defined in Proposition 1. Using the auto LiRPA library Xu et al. (2020), we compute sound, vector-valued affine bounding functions Γ t and Γ t (corresponding to G and G in Proposition 1) for the t-th state output. By Proposition 1, these bounds are guaranteed to be sound. That is, for any specific input x 0 ∈ X 0 , the predicted state xt = Γ t (x 0 , z ref 0:T , u ref 0:T -1 ) satisfies

The KRS X t is defined in (7) as the interval Int(x L t , xU t ), where xL t = min x ′ 0 ∈X 0 Γ t (x ′ 0 , . . . ) and xU t = max x ′ 0 ∈X 0 Γ t (x ′ 0 , . . . ). By the definitions of the minimum and maximum, we have

Hence, xt ∈ Int(x L t , xU t ) = X t . Since this holds for all t ∈ T , the trajectory x0:T is contained within X 0:T .

Theorem 1 (CKRS Coverage Guarantee) Let X 0:T be the KRS defined in (7) and let ēt = [ē t,1 , . . . , ēt,n ] ⊤ ∈ R n . Define the conformalized Koopman reachable set (CKRS) as

where diag(ē t ) ∈ R n×n is a diagonal matrix with ēt on its diagonal. Then, for a new reference trajectory x ref,(0) 0:T drawn from the same distribution, the CKRS contains true closed-loop system trajectory generated by (1b) with probability at least 1δ:

Proof The proof combines a deterministic containment guarantee with a probabilistic coverage guarantee.

Step 1: KRS Containment. By Lemma 1, the decoded closed-loop lifted trajectory

x(0) t,j is deterministically contained within the hyper-rectangular KRS:

x(0) t,j ∈ X t,j = Int(x L t,j , xU t,j ), ∀t ∈ T , ∀j ∈ {1, . . . , n}.

Step 2: Probabilistic Coverage. The CP guarantee requires exchangeability of the nonconformity scores {R (i) } K cal i=0 . In our framework, all calibration and test trajectories share the same reference (z ref 0:T , u ref 0:T -1 ) and controller G t , and their initial states x (i) 0 are sampled i.i.d. from B ϵ (x ref 0 ). Since the decoded closed-loop lifted trajectory (8), true closed-loop trajectory (9), and the nonconformity score are deterministic functions of x (i) 0 , the resulting scores R (i) are also i.i.d., and hence exchangeable. By the CP guarantee, the event

occurs with probability at least 1δ.

Step 3: Combined Guarantee. The true state can be decomposed as x (0) t,j = x(0)

t,j ∈ X 1-δ t,j . Since this holds for all t and j, the trajectory x (0) 0:T is contained in the CKRS X 1-δ 0:T with probability at least 1δ.

Algorithm 1 details the procedure for computing the CKRS. The process begins by lifting the planner-provided reference trajectory x ref 0:T into the latent space to compute the feedforward control u ref via least-squares, and synthesizing a time-varying LQR controller to track this reference. To account for the discrepancy between the learned Koopman dynamics and the true system, the algorithm generates calibration data from system rollouts and utilizes SNSA-based conformal prediction to derive statistically-valid error bounds ēt . In parallel, the nominal KRS, X 0:T , is computed by constructing a computational graph of the closed-loop tracking dynamics in the lifted Koopman state space and computing reachable sets for the linear Koopman dynamics via auto LiRPA. Finally, the algorithm returns a conformalized Koopman reachable set (CKRS), X 1-δ 0:T , obtained by inflating the nominal KRS with the derived conformal error bounds via a Minkowski sum. Crucially, the CKRS guarantees coverage of the trajectories generated by the true system with probability 1δ.

The following section contains the the analytical dynamics we used in this paper (App. 8.3.1) and additional experiment results. In particular, we benchmark our approach with a baseline Koopman dynamics learning approach that leverages polynomial liftings (App. 8.3.2), a baseline learned reachable set prediction method (App. 8.3.3), and a baseline conformal prediction-based reachable set computation method that does not leverage . We also present additional experiment figures, including full reachable set visualizations on a per-state breakdown for the unicycle system (App. 8.3.5), planar quadcopter system (App. 8.3.6), MuJoCo hopper system (App. 8.3.8), and MuJoCo swimmer system (App. 8.3.9). For unicycle experiments (x ∈ R 3 , u ∈ R 2 ), we used the dynamics model, time-discretized via forward Euler (dt = 0.1s)

For planar quadcopter experiments (x ∈ R 6 , u ∈ R 2 ), we used the parameters m = 0.5kg, g = -9.81m/s 2 , I y = 0.01kg • m 2 for the dynamics model, and time-discretized with dt = 0.05 s:

For 3D quadcopter experiments, we used the parameters m = 1kg, g = -9.81 m/s 2 , I x,y,z = [0.5, 0.1, 0.3] kg•m 2 for the dynamics, with a time-discretization of dt = 0.025 s: Figure 5 offers a qualitative comparison of safety violations between the baseline DeepReach method (Bansal and Tomlin, 2021) and ScaRe-Kro. While the main text statistics indicate a lower safety rate for the baseline, this visualization explicitly captures the “reach-avoid” failure mode: the orange trajectories generated by the DeepReach value function clearly drift into the red obstacle region. This highlights the risks of relying on approximate neural PDE solutions for safety-critical constraints. In contrast, the reachable set (shown in blue) computed by our method creates a tight, verified corridor that contains the ground truth rollouts with high, calibrated probability, visually confirming that the controller successfully steers the dynamics clear of the hazard while adhering to the probabilistic bounds.

Figure 6 provides a state-by-state breakdown of the conservativeness gap between the reachable sets computed via the baseline CP-NNV (Hashemi et al., 2024) approach (yellow) and our proposed method (blue). While the tabular results in Section 6 quantify the volume difference, Figure 6 illustrate where that difference originates visually. Overall, one of the key differences between our approach and Hashemi et al. ( 2024) is in using a learned multi-step predictor instead of learned Koopman dynamics to compute reachable sets. We note that the CP-NNV method suffers from more conservative reachable sets, particularly in the angular velocity (p, q, r) and angle (ϕ, θ, ψ) dimensions. We attribute this to the difficulty of computing reachable sets accurately with a learned multi-step predictor over long horizons. Conversely, ScaRe-Kro maintains consistently tight bounds across all 12 dimensions, demonstrating that the Koopman linearization effectively mitigates the “wrapping effect” that causes over-approximation in standard neural network verification. Figure 7 depicts the spatial evolution of the CKRS for the unicycle. Beyond the numerical coverage rates of Sec. 6, this plot serves as a geometric sanity check, showing the reachable set’s ability to contain the reference trajectory (black dashed line) and closed-loop trajectories (green) while navigating tight constraints. The blue tube is the CKRS that represents the region where the system is guaranteed to remain with with probability at least 1δ, illustrating that even with the reachable set inflation induced by the conformalization step, the resulting set remains sufficiently compact to verify safety despite the closeness of the closed-loop rollouts to the red obstacle.

8.3.6. ADDITIONAL RESULTS ON PLANAR QUADCOPTER SYSTEM.

We present the full dimensional trajectory evolution for the 6D planar quadcopter in Figure 8. These plots isolate the performance of the Koopman tracking controller on an underactuated system. Notably, the reachable sets (light blue) and their conformalized counterparts (dark blue) accurately capture the behavior in all state dimensions, bounding the closed-loop trajectories. This confirms that the learned linear structure in the lifted Koopman space preserves the critical coupling effects of the original nonlinear dynamics, while the conformal bounds effectively absorb the residual error in the predicted KRS. We also note that the conformalization of the KRS in this example has a relatively small impact on the overall volume of the reachable set estimates, though we note that this is not always the case, e.g., the hopper system shown in Fig. 11. Moreover, this experiment showcases the offline CP inflation bounds (described in Sec. 5.4) generalize across multiple reference trajectories, we compute the CKRS for four reference trajectories total (Fig. 8a-d) and show that the resulting closed-loop trajectories remain within the computed set. This indicates that a single offline CP calibration can be reused to efficiently inflate reachable sets computed online.

8.3.7. ADDITIONAL RESULTS ON 3D QUADROTOR SYSTEM.

To verify that the offline CP inflation bounds (described in Sec. 5.4) generalize across multiple reference trajectories, we compute the CKRS for three alternative reference trajectories and show that the resulting closed-loop trajectories remain within the computed set (Fig. 9). This indicates that a single offline CP calibration can be reused to efficiently inflate reachable sets computed online. We also visualize two of these CKRSs together with the obstacle that the quadrotor is navigating around (Fig. 10), showing that since the RSOAs do not intersect with the obstacle, the closed-loop system is guaranteed to avoid collision when tracking either reference.

8.3.8. ADDITIONAL RESULTS ON MUJOCO HOPPER SYSTEM.

Figure 11 illustrates the full state-space evolution of the 11-dimensional MuJoCo Hopper (Todorov et al., 2012) system during a forward hopping maneuver. This system poses a unique challenge due to its hybrid dynamics, characterized by non-smooth transitions during ground impacts. The periodic stability of the gait is clearly visible in the “height z” and “torso angle” trajectories, where the ScaRe-Kro framework successfully captures the complex dynamics associated with the flight and stance phases. Notably, while the position states exhibit tight confinement around the reference trajectory (dashed black line), the velocity dimensions (e.g., “Velocity Foot”) display a necessary inflation of the conformal bounds. This expansion correctly accounts for the instantaneous jumps in state caused by contact forces and modeling uncertainties at impact, yet the ground truth rollouts remain strictly contained within the predicted tube, validating the robustness of the Koopman tracking controller under discontinuous contact dynamics.

8.3.9. ADDITIONAL RESULTS ON MUJOCO SWIMMER SYSTEM.

Figure 12 presents the component-wise reachability analysis for the 28-dimensional MuJoCo Swimmer Todorov et al. (2012), representing the most high-dimensional validation of our framework. The plots display the highly coupled, undulatory gait required for locomotion, evidenced by the phaseshifted oscillations across the chain of link joints and velocities. Despite the significant increase in state dimensionality, the CKRS maintain consistent coverage of the closed-loop trajectories across all 28 dimensions. The uniform tightness of the bounds across the swimmer link segments confirms that the learned globally-linear dynamics structure effectively encodes the multi-body dynamics, enabling safe, long-horizon verification for complex, articulated robots while maintaining tractable computation.