From Global to Local: Hierarchical Probabilistic Verification for Reachability Learning
Hamilton-Jacobi (HJ) reachability provides formal safety guarantees for nonlinear systems. However, it becomes computationally intractable in high-dimensional settings, motivating learning-based approximations that may introduce unsafe errors or over…
Authors: Ebonye Smith, Sampada Deglurkar, Jingqi Li
Fr om Global to Local: Hierar chical Pr obabilistic V erification f or Reachability Learning Ebonye Smith 1 , Sampada Deglurkar 1 , Jingqi Li 2 , Gechen Qu 1 , and Claire J. T omlin 1 Abstract — Hamilton–Jacobi (HJ) reachability provides for - mal safety guarantees for nonlinear systems. Howev er , it be- comes computationally intractable in high-dimensional settings, motivating lear ning-based approximations that may introduce unsafe errors or overly optimistic safe sets. In this work, we propose a hierarchical probabilistic v erification framew ork f or reachability lear ning that bridges offline global certification and online local r efinement. W e first construct a coarse safe set using scenario optimization, providing an efficient global probabilistic certificate. W e then introduce an online local refinement module that expands the certified safe set near its boundary by solving a sequence of con vex programs, reco vering regions excluded by the global verification. This refinement reduces conservatism while focusing computation on critical regions of the state space. W e pro vide probabilistic safety guarantees for both the global and locally refined sets. Integrated with a switching mechanism between a learned reachability policy and a model-based controller , the proposed framework improves success rates in goal-reaching tasks with safety constraints, as demonstrated in simulation experiments of two drones racing to a goal with complex safety constraints. I . I N T RO D U C T I O N Prov ably safe controller design is essential for deploy- ing autonomous systems in safety-critical settings such as multi-agent coordination, human–robot interaction, and high- speed na vigation in cluttered environments. Among those applications, a central challenge is to characterize the set of states from which a system can be driven to a desired goal while satisfying safety constraints [1]. Hamilton–Jacobi (HJ) reachability [2]–[4] computes such sets, but classical HJ methods suffer from the curse of dimensionality [5] in high- dimensional systems. T o ov ercome this issue, prior works hav e proposed learning-based approaches that approximate reachable sets and synthesize corresponding control policies [6]–[9]. Howe ver , while they enable scalability , they may introduce errors that lead to overly optimistic approximations that misclassify unsafe regions as safe. *This work was supported by NSF Safe Learning Enabled Systems, the D ARP A Assured Autonomy , ANSR, and TIAMA T programs, and the NASA ULI on Safe A viation Autonomy . E.S. was supported by NSF GRFP , and J.L. by an Oden Institute Fellowship. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of any aforementioned organizations. 1 Ebonye Smith (corresponding author), Sampada Deglurkar , Gechen Qu, and Claire J. T omlin are with the Department of Electrical Engineering and Computer Sciences, Uni versity of California Berkeley , Berkeley , CA 94720, USA, tel:510-643-6610, ebonyesmith@berkeley.edu, sampada deglurkar@berkeley.edu, qugch@berkeley.edu, tomlin@berkeley.edu 2 Jingqi Li is with the Oden Institute of Computational Engineering and Sciences, University of T exas at Austin, Austin, TX 78712, USA, tel: 765- 337-0678, jingqi.li@austin.utexas.edu (b ) Certified Glob al Set an d Loc al Refine men t (a) Hyb rid Switching Frame wor k Loc al R e fin emen t Sa f e Se t R ec o v er y Global V erific a tion T ar g e t Se t Checking Stat e not in State not in State not in Fig. 1. (a) Building on a learned reachability value function and policy from prior work (e.g., [9]), our hierarchical verification framework pro vides probabilistic safety guarantees based on whether the state lies in the globally or locally verified set; otherwise, safe set recovery driv es it to a verified region. Our key novelty is lev eraging scenario optimization for local safe set expansion. (b) Drone racing example in which the ego drone (blue) reaches a refined local safe set that guarantees safe overtaking of an opponent (black). T o ensure safety under learning-based approximations, verification becomes necessary . Existing methods can be categorized along three axes. First, deterministic approaches provide worst-case guarantees but can be ov erly conserv a- tiv e in practice [9]–[14], whereas probabilistic methods are typically less conservati ve but may require many samples to achie ve tight guarantees [15]–[23]. Second, verification acting globally ov er the entire state space is computation- ally expensiv e and sensitiv e to outliers [22], while local verification focuses on regions of interest, enabling more sample-efficient and accurate certification. Third, due to its high computational cost, global verification is typically performed offline [19], whereas online refinement enables adaptiv e correction of overly conservati ve certificates during ex ecution. Existing approaches typically operate within a single regime, limiting their ability to balance accuracy , efficienc y , and scalability in high-dimensional systems. In this work, we propose a multi-tier , hierar chical frame- work that couples probabilistic global certification with adaptiv e local refinement, enabling scalable verification with improv ed accurac y . W e first construct an of fline coarse global safe set via scenario optimization [24], yielding an efficient but conservati ve probabilistic certificate. T o reduce conservatism and focus computation on relev ant regions, we introduce an online local r efinement module that performs targeted sampling near the safe set boundary . By solving a sequence of con ve x programs, the method expands the cer- tified region and recovers safe states discarded by the global approximation while maintaining probabilistic guarantees. The framework further incorporates a switching mechanism between a learned reachability policy and a model-based controller: when the state lies outside the reach-avoid set, where the learned policy may be unreliable, the model-based controller is w arm-started using the learned policy to produce a safer control action. Specifically , our contributions are: • Hierarchical Safe, T arget-Reaching Framework: A multi-tier framework combining global probabilistic verification with online local refinement, inte grating a learned reachability policy with a model-based con- troller for safe goal-reaching. • Scenario-Based Local Expansion: An efficient method for safe set expansion via scenario optimization, re- cov ering re gions that would otherwise be discarded by conservati ve global verification. • Probabilistic Safety Guarantees: Theoretical guaran- tees for both globally and locally verified sets, ensuring safety under local refinement. • Empirical V alidation: Simulation results in drone rac- ing demonstrating improv ed success rates in safe over - taking compared to baseline methods. I I . P R E L I M I NA R I E S In this section, we re view reachability analysis and sce- nario optimization, which underpins our new probabilistic verification framew ork in Section III. A. Reachability Learning Consider a discrete-time nonlinear dynamical system x t +1 = f ( x t , u t ) , where x t ∈ R n and u t ∈ U ⊆ R m represent the state and control at time t . Let r : R n → R be a Lipschitz continuous reward function with Lipschitz constant L r , where r ( x ) > 0 indicates that a state x is in the target set T . Similarly , let c : R n → R be a Lipschitz continuous constraint function with Lipschitz constant L c , where c ( x ) ≤ 0 represents states for which constraint c is violated. Our goal is to identify the set of initial states from which the system can be safely controlled to T by control policy π : R n → U within a finite time horizon T ∈ Z + , referred to as the r each-avoid (RA) set [25]: R := x 0 : ∃ π such that ∃ t < T , ( r ( x t ) > 0 ∧ ∀ τ ∈ [0 , t ] , c ( x τ ) > 0) . (1) Denote by ξ π x 0 a state trajectory from a state x 0 under control policy π . W e define the time-discounted RA measur e g γ ( ξ π x 0 , t ) as g γ ( ξ π x 0 , t ) := min n γ t r ( x t ) , min τ =0 ,...,t γ τ c ( x τ ) o , (2) where γ ∈ (0 , 1) and we hav e g γ ( ξ π x 0 , t ) > 0 if and only if there exists a trajectory from x 0 reaching the target set safely within t time steps. Giv en a horizon T , the finite-horizon RA value function V γ ( x, T ) ev aluates the maximum RA measure: V γ ( x 0 , T ) := max π sup t =0 ,...,T g γ ( ξ π x 0 , t ) . (3) Thus, for all γ ∈ (0 , 1) , the super-zero level set of V γ ( x, T ) , defined as V γ := { x : V γ ( x, T ) > 0 } , is equal to the RA set. W e lev erage learning-based reachability methods (e.g., [8], [9]), which use deep reinforcement learning (RL) to learn a deterministic control policy π and value function V γ . In the rest of the paper, we denote the learned policy by π lp . B. Scenario Optimization Theory Scenario optimization is a data-dri ven framework used to provide probabilistic guarantees for optimization problems under uncertainty ∆ : min z ∈Z c ⊤ z s.t. P ∆ { h ( z , ∆) > 0 } ≤ ϵ (4) where z ∈ Z ⊆ R d is the decision variable, h is con vex, and ϵ ∈ (0 , 1) is the maximum allow able risk. T o solve (4) without assuming a specific distribution for ∆ , we employ the scenario appr oach , replacing the chance constraint with N deterministic constraints sampled from ∆ : z ∗ N = arg min z ∈Z c ⊤ z s.t. N ^ i =1 h ( z , δ ( i ) ) ≤ 0 (5) where each δ ( i ) is an i.i.d. realization of ∆ . For user-specified risk lev el ϵ and confidence parameter β , the required number of deterministic constraints N to achieve a probabilistic guarantee is giv en in Theorem 1. Theorem 1 (Sample Comple xity [16], [26]) . Let ϵ be the risk level and β ∈ (0 , 1) be the confidence parameter . Denote by P S the probability measur e associated with the number of scenarios N ≥ 2 ϵ ln 1 β + d , taken fr om ∆ and by P ∆ the pr obability measur e associated with ∆ . Then z ∗ N satisfies the following inequality: P S ( P ∆ ( h ( z ∗ N , δ ) > 0) ≤ ϵ ) ≥ 1 − β . (6) Inequality (6) states that with high confidence, the proba- bility of an unseen realization δ ∈ ∆ causing a violation of constraint h is bounded by ϵ . T ypically , β is set to be a very small v alue, such as 0 . 001 , to ensure a higher confidence. This comes at a low cost to the sample complexity giv en the log inv erse relationship between β and N . Howe ver , a smaller value of ϵ incurs a larger sample complexity , and thus ϵ is chosen with consideration of the risk versus sample complexity trade-off. I I I . H Y B R I D R E AC H A B I L I T Y V E R I FI C A T I O N M E T H O D In this section, we describe our hierarchical verification framew ork. W e start with a scenario optimization-based technique to estimate how much two trajectories of the dynamics can deviate from each other in order to enable the global verification (Section III-A). Then Section III-B describes our global probabilistic certificate, and Section III-C describes our local probabilistic certificate. The final hierarchical certificate structure is giv en in Section III-D 1 . A. Probabilistic Dynamics Bounding For global verification of the RA set, we must quantify the global sensiti vity of the dynamics under the learned policy π lp . The deterministic verification technique in [9] le verages the Lipschitz constant of the dynamics. Here, we alleviate conservatism by instead estimating a trajectory deviation quantity ∆ x ∗ t using samples. W e collect N i.i.d. scenario pairs { ( ¯ x ( i ) 0 , x ( i ) 0 ) } N i =1 sampled uniformly , where each nominal state ¯ x ( i ) 0 is sampled uni- formly from ¯ X , and each perturbed state x ( i ) 0 is sampled from an ϵ x -ball B ϵ x ( ¯ x ( i ) 0 ) . Thus, each pair takes value from the set X pair . For each pair , we generate a nominal control sequence U ( i ) = { u ( i ) 0 , . . . , u ( i ) T } by applying π lp to ¯ x ( i ) 0 , yielding the nominal trajectory ¯ ξ ( i ) . For computational simplicity , we approximate the control for x ( i ) 0 using the same sequence U ( i ) , yielding the perturbed trajectory ξ ( i ) . The global sensitivity bound ∆ x ∗ t is obtained by solving: min ∆ x t ∈ R ∆ x t s.t. P X pair ( ∥ x t − ¯ x t ∥ 2 > ∆ x t ) ≤ ϵ. (7) W e le verage scenario optimization theory to obtain the following scenario program: ∆ x ∗ t = min ∆ x t ∈ R ∆ x t s.t. N ^ i =1 x ( i ) t − ¯ x ( i ) t 2 ≤ ∆ x t (8) By Theorem 1, the trajectory deviation remains bounded by ∆ x ∗ t with probability 1 − ϵ and confidence 1 − β : P S 1 P X pair ( ∥ x t − ¯ x t ∥ 2 > ∆ x ∗ t ) ≤ ϵ ≥ 1 − β (9) where P S 1 denotes the probability measure associated with the number of samples taken. B. Coarse Global V erification W e leverage the global sensitivity bound ∆ x ∗ t deriv ed in Section III-A to obtain a probabilistically certified lower bound of the v alue function, denoted as ˇ V γ , under learned policy π lp , as illustrated in Theorem 2. Theorem 2 (Global Probabilistic Safety) . Let x 0 ∈ B ϵ x ( ¯ x 0 ) . Suppose that for each t ∈ { 0 , 1 , . . . , T } , ∆ x ∗ t is as calcu- lated above. Let ˇ r t := r ( ¯ x t ) − L r ∆ x ∗ t and ˇ c t := c ( ¯ x t ) − L c ∆ x ∗ t . Define a certificate function ˇ V γ ( ¯ x 0 , T ) , ˇ V γ ( ¯ x 0 , T ) := max t =0 ,...,T min { γ t ˇ r t , min τ =0 ,...,t γ τ ˇ c τ } . (10) Then, the following pr obabilistic guarantee holds: P S 1 ( P X pair ( { V γ ( x 0 ) < 0 } ∩ { ˇ V γ ( ¯ x 0 , T ) ≥ 0 } ) ≤ ϵ ) ≥ 1 − β . Pr oof. Let t ∈ { 1 , . . . , T } . Define the set of states orig- inating from the ball B ϵ x ( ¯ x 0 ) under the nominal controls 1 Code for this project can be found at: https : //github.com/ebo nyelsmith/Hierarchical_Probabilistic_Verification . as X t := { x t : x τ +1 = f ( x τ , π lp ( ¯ x τ )) , τ ≤ t − 1 , x 0 ∈ B ϵ x ( ¯ x 0 ) } . Observe that the set ¯ X t, ¯ x 0 := { x t : ∥ x t − ¯ x t ∥ 2 ≤ ∆ x ∗ t } is a con ve x outer approximation of X t . By Lipschitz continuity of r ( x ) , we have that with confidence 1 − β , ∀ x t ∈ ¯ X t, ¯ x 0 , P X pair ( | r ( x t ) − r ( ¯ x t ) | ≤ L r ∆ x ∗ t ) ≥ 1 − ϵ which yields a lower bound of r ( x t ) , for all x t ∈ ¯ X t, ¯ x 0 : P X pair ( ˇ r t ≤ r ( x t )) ≥ 1 − ϵ. Similarly , define a lower bound of c ( x t ) , for all x t ∈ ¯ X t, ¯ x 0 : P X pair (ˇ c t ≤ c ( x t )) ≥ 1 − ϵ. As shown in (10), we can use ˇ r t and ˇ c t to construct a probabilistic lower bound function ˇ V γ ( ¯ x 0 , T ) for V γ ( x 0 ) , for all x 0 ∈ B ϵ x ( ¯ x 0 ) . In other words, ˇ V γ ( ¯ x 0 , T ) serves as a lower bound of V γ ( x ) with probability 1 − ϵ ov er the support X pair . Denote by X and Y the e vents { V γ ( x 0 ) < ˇ V γ ( ¯ x 0 , T ) } and { ˇ V γ ( ¯ x 0 , T ) ≥ 0 } , respectiv ely . Thus, we have with confidence at least 1 − β , P X pair ( X ) ≤ ϵ . By the law of total probability , we hav e, with confidence at least 1 − β , P X pair ( X ∩ Y ) + P X pair ( X ∩ Y c ) ≤ ϵ, i.e., P X pair ( X ∩ Y ) ≤ ϵ . Thus, with confidence at least 1 − β , P X pair ( { V γ ( x 0 ) < 0 } ∩ { ˇ V γ ( ¯ x 0 , T ) ≥ 0 } ) ≤ ϵ. Theorem 2 states that with high confidence, the probability that the lo wer bounded value function classifies an unsafe state as safe is upper bounded by ϵ . W e define the verified global safe set V g lobal as the super-zero le vel set of ˇ V γ : V g lobal := { x ∈ ¯ X | ˇ V γ ( x, T ) ≥ 0 } . (11) For each state in V g lobal , the failure probability for safely reaching T is bounded by ϵ with confidence 1 − β . C. Local Gr owth Iterative Refinement The global verification tier is inherently conservati ve due to the choice of ϵ x . Therefore, it may be the case that the current state is not within the coarse global set but is safe. T o “reclaim” safe regions that may be misclassified unsafe, we present a quickly computed, high-resolution local refinement on the boundary of V g lobal , defined ∂ V g lobal , closest to the current state, a technique inspired by [19]. Let ˇ x t ∈ ∂ V g lobal and define a local candidate neighbor- hood B r ( ˇ x t ) on the boundary of V g lobal . W e probabilistically verify the local refinement via iterativ e scenario optimization to obtain the maximum verifiable radius r ∗ : r ∗ = max r ∈ R r s.t. N ^ j =1 sup t ∈ [0 ,T ] g γ ( ξ π lp x ( j ) 0 , t ) > 0 , ∀ x ( j ) 0 ∈ B r ( ˇ x t ) . (12) As detailed in Algorithm 1, this refinement consists of specifying some initial refinement radius around the closest boundary point of the verified global set, ˇ x t ∈ ∂ V g lobal (lines 1-2). For M iterations, N initial conditions within the region Algorithm 1: Iterati veGrowth Input: x t , π lp , V g lobal , N , r max , M , T Output: r saf e , V local , 1: Initialize: r ∗ ← r max 2: Find closest boundary point ˇ x ∈ ∂ V g lobal to x t 3: for iter= 0 , 1 , . . . , M − 1 do 4: S i ← Sample N states IID from B r ( ˇ x t ) and roll out trajectories ξ π lp x . 5: if ∃ x ∈ S i s.t. max t ∈ [0 ,T ] g γ ( ξ π lp x , t ) ≤ 0 then 6: r ∗ ← min x ∥ x − ˇ x t ∥ 2 s.t. max t ∈ [0 ,T ] g γ ( ξ π lp x , t ) ≤ 0 7: else 8: break 9: end if 10: end for 11: return V local := B r ∗ ( ˇ x t ) are sampled and simulated under the learned policy (lines 3- 4). The RA measure (2) is ev aluated for each trajectory , and if a violation is found, the radius of the local refinement is reduced to exclude the closest violation (lines 5-6) and the process is repeated by resampling new initial conditions. If no violations are found, the radius and verified refinement V local := B r ∗ ( ˇ x ) are returned (lines 7-11). Using Theorem 1, we obtain the following guarantee on the local growth refinement: P S 2 ( P V local ( sup t ∈ [0 ,T ] g γ ( ξ π lp x ( j ) 0 , t ) ≤ 0) ≤ ϵ ) ≥ 1 − β (13) where P S 2 denotes the probability measure associated with the samples. The inequality in (13) implies that, within the local refinement, the probability of failing to safely reach the target set is bounded by ϵ with high confidence. D. Hierar chical V erification and Contr ol F ramework Our new verification framework features a hybrid switch- ing logic that prioritizes the highest av ailable tier of verified safety . In addition to the learned policy , we employ a Model Predictiv e Path Integral (MPPI) controller [27], warm-started with the learned policy , as a model-based fallback if the current state of the system is not inside one of the locally verified sets. Although the learned reachability polic y may incur approximation errors and cannot guarantee safety , using it as a warm-start for MPPI permits still lev eraging the learned polic y in a useful way . Moreov er, even optimal reach- av oid policies may not guarantee that the system remains within the target set indefinitely [28]. Thus, we also employ MPPI to reach the goal state once the current state is within the target set T . As verification entails a trade-off between accuracy and computational cost, the multi-tier structure enables adap- tiv e verification, performing only the computation needed to certify the reachability set. The resulting hierarchical architecture is detailed in Algorithm 2, with switching logic illustrated in Fig. 1. Algorithm 2: Hierarchical Safe Control Framew ork Input: State x t , V erified safe set V g lobal , Policy π lp Output: Control action u t // Tier 0: Maintain Advantage Mode 1: T ← UpdateSet ( x t ) // Update Target Set 2: if x t ∈ T then 3: u ∗ 0: H ← MPPI Fast 4: retur n u ∗ 0 5: else if x t ∈ V g lobal then 6: // Tier 1: Global Verification 7: retur n π lp ( x t ) 8: else if x t ∈ V local ← Iterativ eGrowth ( x t , π lp , V g lobal ) then 9: // Tier 2: Local Refinement 10: retur n π lp ( x t ) 11: else 12: // Tier 3: Safe Recovery (MPPI) 13: u ∗ 0: H ← MPPI Rec ( W armstart = π lp ) 14: retur n u ∗ 0 15: end if 1) Tier 0: T arget Maintenance Mode: If x t ∈ T , we apply a fast MPPI controller to maintain the system within the target set. 2) Tier 1: Global V erification Mode: If x t ∈ V global , we use π lp to safely reach the target set. 3) Tier 2: Local Refinement Mode: If x t / ∈ V g lobal but x t ∈ V local , the agent utilizes the “reclaimed” local refinement from Algorithm 1. The agent continues to leverage π lp , as the local neighborhood has been probabilistically certified for that specific policy . 4) Tier 3: Safety Recovery Mode: If the state x t cannot be certified by either the global or local verification, the system switches to a reco very MPPI controller . W arm-started by π lp , the controller minimizes a safety- oriented cost defined by the distance to the nearest verified RA set boundary , thereby steering the system tow ard re-entry into the verified set when feasible. I V . T H E O R E T I C A L R E AC H - A V O I D G U A R A N T E ES Giv en the multi-tier structure, our theoretical guarantees are conditioned on the verification mode. Accordingly , we consider the probabilistic guarantees sequentially . A. T ier 1 & T ier 2: Pr obabilistic RA Set V erification Global verification mode (tier 1) and local refinement mode (tier 2) make similar guarantees aligned with prob- abilistic certification via scenario optimization. Proposition 1 (Sequential Probabilistic Guarantees) . F or the global certificate , if x ∈ V g lobal , we achieve the following pr obabilistic guarantee: P S 1 ( P ¯ X ( { V γ ( x t ) < 0 } ∩ { ˇ V γ ( ¯ x t , T ) ≥ 0 } ) ≤ ϵ ) ≥ 1 − β If x t / ∈ V g lobal but x t ∈ V local , we lever age the local certificate as the next “expert” to consider: P S 2 ( P V local ( sup t ∈ [0 ,T ] g γ ( ξ π lp x ( j ) 0 , t ) ≤ 0) ≤ ϵ ) ≥ 1 − β Pr oof. This follows from Theorems 1 and 2. Remark 1. If x t ∈ V g lobal ∩ V local , the problem evolves into choosing between “experts” (or confidence of pr obabilistic guarantees), which we defer to futur e work. B. T ier 3: Finite-time Recovery If x t / ∈ ( V g lobal ∪ V local ) , no safety guarantees can be made. Thus, the best course of action is safe recovery via reachability warm-started MPPI. The forward RA set of the current state x t , V recov er , should be computed. If V recov er ∩ V local = ∅ or V recov er ∩ V g lobal = ∅ , then there exists a trajectory at time step t such that recovery to V local or V g lobal is achieved. V . E X P E R I M E N T S A N D R E S U LT S A. Drone Racing Case Study W e consider a drone racing example as defined in [9], a challenging high-dimensional testbed with dynamic inter- actions and safety-critical constraints, where the ego agent must safely overtak e an opponent to pass through the gate first. For computational efficienc y , we set β = 0 . 001 and ϵ = 0 . 1 for both global and local verification, requiring 158 samples each. The reason for choosing this ϵ value is to reduce conserv atism and sample complexity; smaller values can be used at the cost of increased samples and computation. 1) Joint System Dynamics: W e consider a 12D non- cooperativ e racing scenario inv olving an ego drone ( e ) and an opponent drone ( o ). The state of each drone i ∈ { e, o } is defined by its 3D position and velocity: x i t = [ p i x,t , v i x,t , p i y ,t , v i y ,t , p i z ,t , v i z ,t ] ∈ R 6 . The joint system state is defined as the concatenation x t = [ x e t , x o t ] ∈ X ⊆ R 12 . The i -th drone is modeled by double-integrator dynamics, with control input u i t = [ a i x,t , a i y ,t , a i z ,t ] , satisfying u i t ∞ ≤ ϵ u := 1 m/s 2 . W e assume the opponent has bounded control authority and follows an LQR controller that driv es it tow ard the gate aggressi vely , with high speed. 2) T arg et Set: The mission objective is to pass through a narrow gate centered at [0 , 0 , 0] with width w and height h . Howe ver , for the purpose of reachability , the tar get set for the ego drone is defined as T = x : p e y − p o y > 0 , v e y − v o y > 0 , | p e x | < 0 . 3 , | p e z | < 0 . 3 . (14) This target set requires the ego drone to be ahead of and faster than the opponent while staying within a corridor for gate passage (radius 0.3 m). 3) Safety Constraints: T o ensure safe flight, the ego drone should av oid the area affected by the downward airflo w from the opponent drone, leading to the following noncon ve x constraint, which is challenging to satisfy: p e x,t − p o x,t p e y ,t − p o y ,t 2 2 > 1 + max( p o z ,t − p e z ,t , 0) × 0 . 2 , (15) where the collision avoidance region between the ego and opponent drone grows as their height difference increases. Moreov er , to ensure the ego drone passes through the gate without colliding with its boundaries, we impose the follo w- ing constraints as in [9]: ± p e x,t − p e y ,t > − 0 . 05 , ± p e z ,t − p e y ,t > − 0 . 05 . (16) B. Hypothesis 1: The r eachability-based guidance of our method leads to a higher success rate than soft- and har d- constrained model-based controller s or RL policies. W e empirically compare our framew ork to various safe baseline methods: MPPI with a control barrier function (CBF) safety filter , MPPI with safe soft constraints, and proximal policy optimization (PPO) with a CBF safety filter . W e sample 500 initial conditions in a bounded region of the state space and roll out the trajectories. Success is quantified by the ego safely reaching the gate before the opponent, and our frame work achie ves a higher rate of success than its safe counterparts as shown in Figure 3(a). Qualitativ ely , baseline methods tend to prioritize goal-reaching at the expense of safety , whereas our method demonstrates more robust collision avoidance during overtaking (Fig. 2). C. Hypothesis 2: The hierar chical contr ol framework fea- tur es a better combination of MPPI and learning. Similar to Section V -B, we conduct an empirical study comparing our framework to baseline methods: Hybrid Ab- lation which conducts switching but replaces the reachability policy with MPPI, pure MPPI with no safety consideration, MPPI warm-started with the reachability policy , and the reachability policy only . W e select these baselines to observe the interaction between MPPI, the reachability policy , and the switching mechanism. W e sample 500 initial conditions in a bounded region of the state space and roll out the trajectories. The results in Figure 3(b) highlight the necessity of both the switching mechanism and the reachability policy as none of the other baselines achie ved the same rate of success as our hierarchical frame work. Notably , the reachability policy prioritizes reaching the target set but does not guarantee maintaining the system within it, a limitation inherent to reachability theory , as also observ ed in [28]. This explains its lower success rate when used alone. V I . C O N C L U S I O N S In this work, we propose an efficient hierarchical proba- bilistic verification framework for reachability learning that unifies global, local, offline, and online verification. By combining a coarse global certificate via scenario optimiza- tion with local con vex refinement, the approach reduces conservatism while focusing computation on critical regions. −0.5 0.0 0.5 x [m] −2.5 −2.0 −1.5 −1.0 −0.5 0.0 y [m] t = 0.0s −0.5 0.0 0.5 x [m] t = 0.7s −0.5 0.0 0.5 x [m] t = 1.1s −0.5 0.0 0.5 x [m] t = 1.4s −0.5 0.0 0.5 x [m] t = 1.8s Ego (Hybrid) Ego (MPPI (Unsafe)) Ego (MPPI + CBF) Ego (MPPI + Safe Cost) Ego (PPO + CBF) Opponent Gate Refinement Safe Set Fig. 2. Evolution of a drone racing simulation against an opponent (black), comparing our method (Hybrid, dark blue) with various baselines. Baselines struggle to balance safety and goal-reaching, whereas our method achieves both, safely overtaking and reaching the gate. Moreover , we observe that the ego agent lev erages safe recovery MPPI to try to enter the locally refined set. 0 50 100 Success Rate (%) Hybrid (ours) MPPI + CBF MPPI + Safe PPO + CBF 90.2% 62.4% 63.2% 14.2% (a) Safety-Critical Baselines 0 50 100 Success Rate (%) Hybrid (ours) Hybrid Ablation (no reach.) MPPI (no safety) MPPI + Reach. W armstart Reachability Policy Only 90.2% 68.4% 63.4% 62.8% 51.8% (b) Learning & Ablation Study Fig. 3. (a) Performance of our framework compared to control barrier function (CBF) and soft-constraint baselines. (b) Ablation study comparing the hybrid method to its learning-only and model-based variants. In both (a) and (b), the results for our method are shown for reference. Our framework achieves higher success pr obability over 500 sampled initial conditions, effectively balancing collision avoidance and goal-reaching . A switching mechanism between learned and model-based controllers ensures safe and efficient control. The framew ork provides probabilistic safety guarantees and improv es per- formance in safety-critical tasks, as demonstrated in drone racing simulations. Future work includes hardware validation and extension to more complex multi-agent settings and uncertain real-world en vironments. R E F E R E N C E S [1] K. P . W abersich, A. J. T aylor, J. J. Choi, K. Sreenath, C. J. T om- lin, A. D. Ames, and M. N. Zeilinger, “Data-driven safety filters: Hamilton-jacobi reachability , control barrier functions, and predictive methods for uncertain systems, ” IEEE Control Systems Magazine , vol. 43, no. 5, pp. 137–177, 2023. [2] C. T omlin, G. J. Pappas, and S. Sastry , “Conflict resolution for air traffic management: A study in multiagent hybrid systems, ” IEEE T ransactions on automatic control , vol. 43, no. 4, pp. 509–521, 2002. [3] J. L ygeros, “On reachability and minimum cost optimal control, ” Automatica , vol. 40, no. 6, pp. 917–927, 2004. [4] I. M. Mitchell, A. M. Bayen, and C. J. T omlin, “ A time-dependent hamilton-jacobi formulation of reachable sets for continuous dynamic games, ” IEEE T ransactions on automatic contr ol , vol. 50, no. 7, pp. 947–957, 2005. [5] S. Bansal, M. Chen, S. Herbert, and C. J. T omlin, “Hamilton-jacobi reachability: A brief overvie w and recent advances, ” in 2017 IEEE 56th annual confer ence on decision and control (CDC) , pp. 2242– 2253, IEEE, 2017. [6] S. Bansal and C. J. T omlin, “Deepreach: A deep learning approach to high-dimensional reachability , ” in 2021 IEEE International Con- fer ence on Robotics and Automation (ICRA) , pp. 1817–1824, IEEE, 2021. [7] J. F . Fisac, N. F . Lugov oy , V . Rubies-Royo, S. Ghosh, and C. J. T omlin, “Bridging hamilton-jacobi safety analysis and reinforcement learning, ” in 2019 International Confer ence on Robotics and Automation (ICRA) , pp. 8550–8556, IEEE, 2019. [8] K.-C. Hsu, V . Rubies-Royo, C. J. T omlin, and J. F . Fisac, “Safety and liveness guarantees through reach-av oid reinforcement learning, ” arXiv preprint arXiv:2112.12288 , 2021. [9] J. Li, D. Lee, J. Lee, K. S. Dong, S. Sojoudi, and C. T omlin, “Certifiable reachability learning using a new lipschitz continuous value function, ” IEEE Robotics and Automation Letters , vol. 10, no. 4, pp. 3582–3589, 2025. [10] M. Althoff and B. H. Krogh, “Zonotope b undles for the efficient computation of reachable sets, ” in 2011 50th IEEE conference on decision and contr ol and Eur opean contr ol confer ence , pp. 6814–6821, IEEE, 2011. [11] S. Coogan, “Mixed monotonicity for reachability and safety in dy- namical systems, ” in 2020 59th IEEE Conference on Decision and Contr ol (CDC) , pp. 5074–5085, IEEE, 2020. [12] H. Hu, M. Fazlyab, M. Morari, and G. J. Pappas, “Reach-sdp: Reach- ability analysis of closed-loop systems with neural network controllers via semidefinite programming, ” in 2020 59th IEEE conference on decision and control (CDC) , pp. 5929–5934, IEEE, 2020. [13] T . Lew and M. Pa vone, “Sampling-based reachability analysis: A random set theory approach with adversarial sampling, ” in Conference on r obot learning , pp. 2055–2070, PMLR, 2021. [14] M. Abate and S. Coogan, “Robustly forward inv ariant sets for mixed- monotone systems, ” IEEE Tr ansactions on Automatic Control , vol. 67, no. 9, pp. 4947–4954, 2022. [15] L. Liebenwein, C. Baykal, I. Gilitschenski, S. Karaman, and D. Rus, “Sampling-based approximation algorithms for reachability analysis with prov able guarantees, ” in Robotics: Sciences and Systems , 2018. [16] A. Devonport and M. Arcak, “Estimating reachable sets with scenario optimization, ” in Learning for dynamics and contr ol , pp. 75–84, PMLR, 2020. [17] M. E. Cao, M. Bloch, and S. Coogan, “Estimating high probability reachable sets using gaussian processes, ” in 2021 60th IEEE Confer- ence on Decision and Contr ol (CDC) , pp. 3881–3886, IEEE, 2021. [18] A. Devonport, F . Y ang, L. El Ghaoui, and M. Arcak, “Data-driv en reachability analysis with christoffel functions, ” in 60th IEEE Confer- ence on Decision and Contr ol (CDC) , pp. 5067–5072, IEEE, 2021. [19] A. Lin and S. Bansal, “Generating formal safety assurances for high- dimensional reachability , ” in 2023 IEEE International Conference on Robotics and Automation (ICRA) , pp. 10525–10531, IEEE, 2023. [20] L. Lindemann, M. Clea veland, G. Shim, and G. J. Pappas, “Safe planning in dynamic environments using conformal prediction, ” IEEE Robotics and Automation Letters , vol. 8, no. 8, pp. 5116–5123, 2023. [21] A. Lin and S. Bansal, “V erification of neural reachable tubes via sce- nario optimization and conformal prediction, ” in 6th Annual Learning for Dynamics & Control Conference , pp. 719–731, PMLR, 2024. [22] A. Lin, A. Pinto, and S. Bansal, “Rob ust verification of controllers under state uncertainty via hamilton-jacobi reachability analysis, ” arXiv preprint arXiv:2511.14755 , 2025. [23] E. Dietrich, R. Dev onport, S. Tu, and M. Arcak, “Data-driv en reach- ability with scenario optimization and the holdout method, ” in IEEE 64th Confer ence on Decision and Contr ol (CDC) , pp. 3925–3931, IEEE, 2025. [24] M. C. Campi and S. Garatti, Introduction to the scenario appr oach . SIAM, 2018. [25] K. Margellos and J. Lygeros, “Hamilton–jacobi formulation for reach– av oid differential games, ” IEEE T ransactions on automatic contr ol , vol. 56, no. 8, pp. 1849–1861, 2011. [26] M. C. Campi, S. Garatti, and M. Prandini, “The scenario approach for systems and control design, ” Annual Reviews in Contr ol , vol. 33, no. 2, pp. 149–157, 2009. [27] G. Williams, P . Drews, B. Goldfain, J. M. Rehg, and E. A. Theodorou, “ Aggressive driving with model predictive path integral control, ” in IEEE international conference on robotics and automation (ICRA) , pp. 1433–1440, IEEE, 2016. [28] G. Chenev ert, J. Li, S. Bae, D. Lee, et al. , “Solving reach-av oid-stay problems using deep deterministic policy gradients, ” Accepted by IEEE international conference on robotics and automation (ICRA) , 2026.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment