From Global to Local: Hierarchical Probabilistic Verification for Reachability Learning

Fr om Global to Local: Hierar chical Pr obabilistic V eriﬁcation f or Reachability Learning Ebonye Smith 1 , Sampada Deglurkar 1 , Jingqi Li 2 , Gechen Qu 1 , and Claire J. T omlin 1 Abstract — Hamilton–Jacobi (HJ) reachability provides for - mal safety guarantees for nonlinear systems. Howev er , it be- comes computationally intractable in high-dimensional settings, motivating lear ning-based approximations that may introduce unsafe errors or overly optimistic safe sets. In this work, we propose a hierarchical probabilistic v eriﬁcation framew ork f or reachability lear ning that bridges ofﬂine global certiﬁcation and online local r eﬁnement. W e ﬁrst construct a coarse safe set using scenario optimization, providing an efﬁcient global probabilistic certiﬁcate. W e then introduce an online local reﬁnement module that expands the certiﬁed safe set near its boundary by solving a sequence of con vex programs, reco vering regions excluded by the global veriﬁcation. This reﬁnement reduces conservatism while focusing computation on critical regions of the state space. W e pro vide probabilistic safety guarantees for both the global and locally reﬁned sets. Integrated with a switching mechanism between a learned reachability policy and a model-based controller , the proposed framework improves success rates in goal-reaching tasks with safety constraints, as demonstrated in simulation experiments of two drones racing to a goal with complex safety constraints. I . I N T RO D U C T I O N Prov ably safe controller design is essential for deploy- ing autonomous systems in safety-critical settings such as multi-agent coordination, human–robot interaction, and high- speed na vigation in cluttered environments. Among those applications, a central challenge is to characterize the set of states from which a system can be driven to a desired goal while satisfying safety constraints [1]. Hamilton–Jacobi (HJ) reachability [2]–[4] computes such sets, but classical HJ methods suffer from the curse of dimensionality [5] in high- dimensional systems. T o ov ercome this issue, prior works hav e proposed learning-based approaches that approximate reachable sets and synthesize corresponding control policies [6]–[9]. Howe ver , while they enable scalability , they may introduce errors that lead to overly optimistic approximations that misclassify unsafe regions as safe. *This work was supported by NSF Safe Learning Enabled Systems, the D ARP A Assured Autonomy , ANSR, and TIAMA T programs, and the NASA ULI on Safe A viation Autonomy . E.S. was supported by NSF GRFP , and J.L. by an Oden Institute Fellowship. Any opinions, ﬁndings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reﬂect the views of any aforementioned organizations. 1 Ebonye Smith (corresponding author), Sampada Deglurkar , Gechen Qu, and Claire J. T omlin are with the Department of Electrical Engineering and Computer Sciences, Uni versity of California Berkeley , Berkeley , CA 94720, USA, tel:510-643-6610, ebonyesmith@berkeley.edu, sampada deglurkar@berkeley.edu, qugch@berkeley.edu, tomlin@berkeley.edu 2 Jingqi Li is with the Oden Institute of Computational Engineering and Sciences, University of T exas at Austin, Austin, TX 78712, USA, tel: 765- 337-0678, jingqi.li@austin.utexas.edu (b ) Certified Glob al Set an d Loc al Refine men t (a) Hyb rid Switching Frame wor k Loc al R e fin emen t Sa f e Se t R ec o v er y Global V erific a tion T ar g e t Se t Checking Stat e not in State not in State not in Fig. 1. (a) Building on a learned reachability value function and policy from prior work (e.g., [9]), our hierarchical veriﬁcation framework pro vides probabilistic safety guarantees based on whether the state lies in the globally or locally veriﬁed set; otherwise, safe set recovery driv es it to a veriﬁed region. Our key novelty is lev eraging scenario optimization for local safe set expansion. (b) Drone racing example in which the ego drone (blue) reaches a reﬁned local safe set that guarantees safe overtaking of an opponent (black). T o ensure safety under learning-based approximations, veriﬁcation becomes necessary . Existing methods can be categorized along three axes. First, deterministic approaches provide worst-case guarantees but can be ov erly conserv a- tiv e in practice [9]–[14], whereas probabilistic methods are typically less conservati ve but may require many samples to achie ve tight guarantees [15]–[23]. Second, veriﬁcation acting globally ov er the entire state space is computation- ally expensiv e and sensitiv e to outliers [22], while local veriﬁcation focuses on regions of interest, enabling more sample-efﬁcient and accurate certiﬁcation. Third, due to its high computational cost, global veriﬁcation is typically performed ofﬂine [19], whereas online reﬁnement enables adaptiv e correction of overly conservati ve certiﬁcates during ex ecution. Existing approaches typically operate within a single regime, limiting their ability to balance accuracy , efﬁcienc y , and scalability in high-dimensional systems. In this work, we propose a multi-tier , hierar chical frame- work that couples probabilistic global certiﬁcation with adaptiv e local reﬁnement, enabling scalable veriﬁcation with improv ed accurac y . W e ﬁrst construct an of ﬂine coarse global safe set via scenario optimization [24], yielding an efﬁcient but conservati ve probabilistic certiﬁcate. T o reduce conservatism and focus computation on relev ant regions, we introduce an online local r eﬁnement module that performs targeted sampling near the safe set boundary . By solving a sequence of con ve x programs, the method expands the cer- tiﬁed region and recovers safe states discarded by the global approximation while maintaining probabilistic guarantees. The framework further incorporates a switching mechanism between a learned reachability policy and a model-based controller: when the state lies outside the reach-avoid set, where the learned policy may be unreliable, the model-based controller is w arm-started using the learned policy to produce a safer control action. Speciﬁcally , our contributions are: • Hierarchical Safe, T arget-Reaching Framework: A multi-tier framework combining global probabilistic veriﬁcation with online local reﬁnement, inte grating a learned reachability policy with a model-based con- troller for safe goal-reaching. • Scenario-Based Local Expansion: An efﬁcient method for safe set expansion via scenario optimization, re- cov ering re gions that would otherwise be discarded by conservati ve global veriﬁcation. • Probabilistic Safety Guarantees: Theoretical guaran- tees for both globally and locally veriﬁed sets, ensuring safety under local reﬁnement. • Empirical V alidation: Simulation results in drone rac- ing demonstrating improv ed success rates in safe over - taking compared to baseline methods. I I . P R E L I M I NA R I E S In this section, we re view reachability analysis and sce- nario optimization, which underpins our new probabilistic veriﬁcation framew ork in Section III. A. Reachability Learning Consider a discrete-time nonlinear dynamical system x t +1 = f ( x t , u t ) , where x t ∈ R n and u t ∈ U ⊆ R m represent the state and control at time t . Let r : R n → R be a Lipschitz continuous reward function with Lipschitz constant L r , where r ( x ) > 0 indicates that a state x is in the target set T . Similarly , let c : R n → R be a Lipschitz continuous constraint function with Lipschitz constant L c , where c ( x ) ≤ 0 represents states for which constraint c is violated. Our goal is to identify the set of initial states from which the system can be safely controlled to T by control policy π : R n → U within a ﬁnite time horizon T ∈ Z + , referred to as the r each-avoid (RA) set [25]: R :=  x 0 : ∃ π such that ∃ t < T , ( r ( x t ) > 0 ∧ ∀ τ ∈ [0 , t ] , c ( x τ ) > 0)  . (1) Denote by ξ π x 0 a state trajectory from a state x 0 under control policy π . W e deﬁne the time-discounted RA measur e g γ ( ξ π x 0 , t ) as g γ ( ξ π x 0 , t ) := min n γ t r ( x t ) , min τ =0 ,...,t γ τ c ( x τ ) o , (2) where γ ∈ (0 , 1) and we hav e g γ ( ξ π x 0 , t ) > 0 if and only if there exists a trajectory from x 0 reaching the target set safely within t time steps. Giv en a horizon T , the ﬁnite-horizon RA value function V γ ( x, T ) ev aluates the maximum RA measure: V γ ( x 0 , T ) := max π sup t =0 ,...,T g γ ( ξ π x 0 , t ) . (3) Thus, for all γ ∈ (0 , 1) , the super-zero level set of V γ ( x, T ) , deﬁned as V γ := { x : V γ ( x, T ) > 0 } , is equal to the RA set. W e lev erage learning-based reachability methods (e.g., [8], [9]), which use deep reinforcement learning (RL) to learn a deterministic control policy π and value function V γ . In the rest of the paper, we denote the learned policy by π lp . B. Scenario Optimization Theory Scenario optimization is a data-dri ven framework used to provide probabilistic guarantees for optimization problems under uncertainty ∆ : min z ∈Z c ⊤ z s.t. P ∆ { h ( z , ∆) > 0 } ≤ ϵ (4) where z ∈ Z ⊆ R d is the decision variable, h is con vex, and ϵ ∈ (0 , 1) is the maximum allow able risk. T o solve (4) without assuming a speciﬁc distribution for ∆ , we employ the scenario appr oach , replacing the chance constraint with N deterministic constraints sampled from ∆ : z ∗ N = arg min z ∈Z c ⊤ z s.t. N ^ i =1 h ( z , δ ( i ) ) ≤ 0 (5) where each δ ( i ) is an i.i.d. realization of ∆ . For user-speciﬁed risk lev el ϵ and conﬁdence parameter β , the required number of deterministic constraints N to achieve a probabilistic guarantee is giv en in Theorem 1. Theorem 1 (Sample Comple xity [16], [26]) . Let ϵ be the risk level and β ∈ (0 , 1) be the conﬁdence parameter . Denote by P S the probability measur e associated with the number of scenarios N ≥ 2 ϵ  ln 1 β + d  , taken fr om ∆ and by P ∆ the pr obability measur e associated with ∆ . Then z ∗ N satisﬁes the following inequality: P S ( P ∆ ( h ( z ∗ N , δ ) > 0) ≤ ϵ ) ≥ 1 − β . (6) Inequality (6) states that with high conﬁdence, the proba- bility of an unseen realization δ ∈ ∆ causing a violation of constraint h is bounded by ϵ . T ypically , β is set to be a very small v alue, such as 0 . 001 , to ensure a higher conﬁdence. This comes at a low cost to the sample complexity giv en the log inv erse relationship between β and N . Howe ver , a smaller value of ϵ incurs a larger sample complexity , and thus ϵ is chosen with consideration of the risk versus sample complexity trade-off. I I I . H Y B R I D R E AC H A B I L I T Y V E R I FI C A T I O N M E T H O D In this section, we describe our hierarchical veriﬁcation framew ork. W e start with a scenario optimization-based technique to estimate how much two trajectories of the dynamics can deviate from each other in order to enable the global veriﬁcation (Section III-A). Then Section III-B describes our global probabilistic certiﬁcate, and Section III-C describes our local probabilistic certiﬁcate. The ﬁnal hierarchical certiﬁcate structure is giv en in Section III-D 1 . A. Probabilistic Dynamics Bounding For global veriﬁcation of the RA set, we must quantify the global sensiti vity of the dynamics under the learned policy π lp . The deterministic veriﬁcation technique in [9] le verages the Lipschitz constant of the dynamics. Here, we alleviate conservatism by instead estimating a trajectory deviation quantity ∆ x ∗ t using samples. W e collect N i.i.d. scenario pairs { ( ¯ x ( i ) 0 , x ( i ) 0 ) } N i =1 sampled uniformly , where each nominal state ¯ x ( i ) 0 is sampled uni- formly from ¯ X , and each perturbed state x ( i ) 0 is sampled from an ϵ x -ball B ϵ x ( ¯ x ( i ) 0 ) . Thus, each pair takes value from the set X pair . For each pair , we generate a nominal control sequence U ( i ) = { u ( i ) 0 , . . . , u ( i ) T } by applying π lp to ¯ x ( i ) 0 , yielding the nominal trajectory ¯ ξ ( i ) . For computational simplicity , we approximate the control for x ( i ) 0 using the same sequence U ( i ) , yielding the perturbed trajectory ξ ( i ) . The global sensitivity bound ∆ x ∗ t is obtained by solving: min ∆ x t ∈ R ∆ x t s.t. P X pair ( ∥ x t − ¯ x t ∥ 2 > ∆ x t ) ≤ ϵ. (7) W e le verage scenario optimization theory to obtain the following scenario program: ∆ x ∗ t = min ∆ x t ∈ R ∆ x t s.t. N ^ i =1     x ( i ) t − ¯ x ( i ) t    2 ≤ ∆ x t  (8) By Theorem 1, the trajectory deviation remains bounded by ∆ x ∗ t with probability 1 − ϵ and conﬁdence 1 − β : P S 1  P X pair ( ∥ x t − ¯ x t ∥ 2 > ∆ x ∗ t ) ≤ ϵ  ≥ 1 − β (9) where P S 1 denotes the probability measure associated with the number of samples taken. B. Coarse Global V eriﬁcation W e leverage the global sensitivity bound ∆ x ∗ t deriv ed in Section III-A to obtain a probabilistically certiﬁed lower bound of the v alue function, denoted as ˇ V γ , under learned policy π lp , as illustrated in Theorem 2. Theorem 2 (Global Probabilistic Safety) . Let x 0 ∈ B ϵ x ( ¯ x 0 ) . Suppose that for each t ∈ { 0 , 1 , . . . , T } , ∆ x ∗ t is as calcu- lated above. Let ˇ r t := r ( ¯ x t ) − L r ∆ x ∗ t and ˇ c t := c ( ¯ x t ) − L c ∆ x ∗ t . Deﬁne a certiﬁcate function ˇ V γ ( ¯ x 0 , T ) , ˇ V γ ( ¯ x 0 , T ) := max t =0 ,...,T min { γ t ˇ r t , min τ =0 ,...,t γ τ ˇ c τ } . (10) Then, the following pr obabilistic guarantee holds: P S 1 ( P X pair ( { V γ ( x 0 ) < 0 } ∩ { ˇ V γ ( ¯ x 0 , T ) ≥ 0 } ) ≤ ϵ ) ≥ 1 − β . Pr oof. Let t ∈ { 1 , . . . , T } . Deﬁne the set of states orig- inating from the ball B ϵ x ( ¯ x 0 ) under the nominal controls 1 Code for this project can be found at: https : //github.com/ebo nyelsmith/Hierarchical_Probabilistic_Verification . as X t := { x t : x τ +1 = f ( x τ , π lp ( ¯ x τ )) , τ ≤ t − 1 , x 0 ∈ B ϵ x ( ¯ x 0 ) } . Observe that the set ¯ X t, ¯ x 0 := { x t : ∥ x t − ¯ x t ∥ 2 ≤ ∆ x ∗ t } is a con ve x outer approximation of X t . By Lipschitz continuity of r ( x ) , we have that with conﬁdence 1 − β , ∀ x t ∈ ¯ X t, ¯ x 0 , P X pair ( | r ( x t ) − r ( ¯ x t ) | ≤ L r ∆ x ∗ t ) ≥ 1 − ϵ which yields a lower bound of r ( x t ) , for all x t ∈ ¯ X t, ¯ x 0 : P X pair ( ˇ r t ≤ r ( x t )) ≥ 1 − ϵ. Similarly , deﬁne a lower bound of c ( x t ) , for all x t ∈ ¯ X t, ¯ x 0 : P X pair (ˇ c t ≤ c ( x t )) ≥ 1 − ϵ. As shown in (10), we can use ˇ r t and ˇ c t to construct a probabilistic lower bound function ˇ V γ ( ¯ x 0 , T ) for V γ ( x 0 ) , for all x 0 ∈ B ϵ x ( ¯ x 0 ) . In other words, ˇ V γ ( ¯ x 0 , T ) serves as a lower bound of V γ ( x ) with probability 1 − ϵ ov er the support X pair . Denote by X and Y the e vents { V γ ( x 0 ) < ˇ V γ ( ¯ x 0 , T ) } and { ˇ V γ ( ¯ x 0 , T ) ≥ 0 } , respectiv ely . Thus, we have with conﬁdence at least 1 − β , P X pair ( X ) ≤ ϵ . By the law of total probability , we hav e, with conﬁdence at least 1 − β , P X pair ( X ∩ Y ) + P X pair ( X ∩ Y c ) ≤ ϵ, i.e., P X pair ( X ∩ Y ) ≤ ϵ . Thus, with conﬁdence at least 1 − β , P X pair ( { V γ ( x 0 ) < 0 } ∩ { ˇ V γ ( ¯ x 0 , T ) ≥ 0 } ) ≤ ϵ. Theorem 2 states that with high conﬁdence, the probability that the lo wer bounded value function classiﬁes an unsafe state as safe is upper bounded by ϵ . W e deﬁne the veriﬁed global safe set V g lobal as the super-zero le vel set of ˇ V γ : V g lobal := { x ∈ ¯ X | ˇ V γ ( x, T ) ≥ 0 } . (11) For each state in V g lobal , the failure probability for safely reaching T is bounded by ϵ with conﬁdence 1 − β . C. Local Gr owth Iterative Reﬁnement The global veriﬁcation tier is inherently conservati ve due to the choice of ϵ x . Therefore, it may be the case that the current state is not within the coarse global set but is safe. T o “reclaim” safe regions that may be misclassiﬁed unsafe, we present a quickly computed, high-resolution local reﬁnement on the boundary of V g lobal , deﬁned ∂ V g lobal , closest to the current state, a technique inspired by [19]. Let ˇ x t ∈ ∂ V g lobal and deﬁne a local candidate neighbor- hood B r ( ˇ x t ) on the boundary of V g lobal . W e probabilistically verify the local reﬁnement via iterativ e scenario optimization to obtain the maximum veriﬁable radius r ∗ : r ∗ = max r ∈ R r s.t. N ^ j =1 sup t ∈ [0 ,T ] g γ ( ξ π lp x ( j ) 0 , t ) > 0 , ∀ x ( j ) 0 ∈ B r ( ˇ x t ) . (12) As detailed in Algorithm 1, this reﬁnement consists of specifying some initial reﬁnement radius around the closest boundary point of the veriﬁed global set, ˇ x t ∈ ∂ V g lobal (lines 1-2). For M iterations, N initial conditions within the region Algorithm 1: Iterati veGrowth Input: x t , π lp , V g lobal , N , r max , M , T Output: r saf e , V local , 1: Initialize: r ∗ ← r max 2: Find closest boundary point ˇ x ∈ ∂ V g lobal to x t 3: for iter= 0 , 1 , . . . , M − 1 do 4: S i ← Sample N states IID from B r ( ˇ x t ) and roll out trajectories ξ π lp x . 5: if ∃ x ∈ S i s.t. max t ∈ [0 ,T ] g γ ( ξ π lp x , t ) ≤ 0 then 6: r ∗ ← min x ∥ x − ˇ x t ∥ 2 s.t. max t ∈ [0 ,T ] g γ ( ξ π lp x , t ) ≤ 0 7: else 8: break 9: end if 10: end for 11: return V local := B r ∗ ( ˇ x t ) are sampled and simulated under the learned policy (lines 3- 4). The RA measure (2) is ev aluated for each trajectory , and if a violation is found, the radius of the local reﬁnement is reduced to exclude the closest violation (lines 5-6) and the process is repeated by resampling new initial conditions. If no violations are found, the radius and veriﬁed reﬁnement V local := B r ∗ ( ˇ x ) are returned (lines 7-11). Using Theorem 1, we obtain the following guarantee on the local growth reﬁnement: P S 2 ( P V local ( sup t ∈ [0 ,T ] g γ ( ξ π lp x ( j ) 0 , t ) ≤ 0) ≤ ϵ ) ≥ 1 − β (13) where P S 2 denotes the probability measure associated with the samples. The inequality in (13) implies that, within the local reﬁnement, the probability of failing to safely reach the target set is bounded by ϵ with high conﬁdence. D. Hierar chical V eriﬁcation and Contr ol F ramework Our new veriﬁcation framework features a hybrid switch- ing logic that prioritizes the highest av ailable tier of veriﬁed safety . In addition to the learned policy , we employ a Model Predictiv e Path Integral (MPPI) controller [27], warm-started with the learned policy , as a model-based fallback if the current state of the system is not inside one of the locally veriﬁed sets. Although the learned reachability polic y may incur approximation errors and cannot guarantee safety , using it as a warm-start for MPPI permits still lev eraging the learned polic y in a useful way . Moreov er, even optimal reach- av oid policies may not guarantee that the system remains within the target set indeﬁnitely [28]. Thus, we also employ MPPI to reach the goal state once the current state is within the target set T . As veriﬁcation entails a trade-off between accuracy and computational cost, the multi-tier structure enables adap- tiv e veriﬁcation, performing only the computation needed to certify the reachability set. The resulting hierarchical architecture is detailed in Algorithm 2, with switching logic illustrated in Fig. 1. Algorithm 2: Hierarchical Safe Control Framew ork Input: State x t , V eriﬁed safe set V g lobal , Policy π lp Output: Control action u t // Tier 0: Maintain Advantage Mode 1: T ← UpdateSet ( x t ) // Update Target Set 2: if x t ∈ T then 3: u ∗ 0: H ← MPPI Fast 4: retur n u ∗ 0 5: else if x t ∈ V g lobal then 6: // Tier 1: Global Verification 7: retur n π lp ( x t ) 8: else if x t ∈ V local ← Iterativ eGrowth ( x t , π lp , V g lobal ) then 9: // Tier 2: Local Refinement 10: retur n π lp ( x t ) 11: else 12: // Tier 3: Safe Recovery (MPPI) 13: u ∗ 0: H ← MPPI Rec ( W armstart = π lp ) 14: retur n u ∗ 0 15: end if 1) Tier 0: T arget Maintenance Mode: If x t ∈ T , we apply a fast MPPI controller to maintain the system within the target set. 2) Tier 1: Global V eriﬁcation Mode: If x t ∈ V global , we use π lp to safely reach the target set. 3) Tier 2: Local Reﬁnement Mode: If x t / ∈ V g lobal but x t ∈ V local , the agent utilizes the “reclaimed” local reﬁnement from Algorithm 1. The agent continues to leverage π lp , as the local neighborhood has been probabilistically certiﬁed for that speciﬁc policy . 4) Tier 3: Safety Recovery Mode: If the state x t cannot be certiﬁed by either the global or local veriﬁcation, the system switches to a reco very MPPI controller . W arm-started by π lp , the controller minimizes a safety- oriented cost deﬁned by the distance to the nearest veriﬁed RA set boundary , thereby steering the system tow ard re-entry into the veriﬁed set when feasible. I V . T H E O R E T I C A L R E AC H - A V O I D G U A R A N T E ES Giv en the multi-tier structure, our theoretical guarantees are conditioned on the veriﬁcation mode. Accordingly , we consider the probabilistic guarantees sequentially . A. T ier 1 & T ier 2: Pr obabilistic RA Set V eriﬁcation Global veriﬁcation mode (tier 1) and local reﬁnement mode (tier 2) make similar guarantees aligned with prob- abilistic certiﬁcation via scenario optimization. Proposition 1 (Sequential Probabilistic Guarantees) . F or the global certiﬁcate , if x ∈ V g lobal , we achieve the following pr obabilistic guarantee: P S 1 ( P ¯ X ( { V γ ( x t ) < 0 } ∩ { ˇ V γ ( ¯ x t , T ) ≥ 0 } ) ≤ ϵ ) ≥ 1 − β If x t / ∈ V g lobal but x t ∈ V local , we lever age the local certiﬁcate as the next “expert” to consider: P S 2 ( P V local ( sup t ∈ [0 ,T ] g γ ( ξ π lp x ( j ) 0 , t ) ≤ 0) ≤ ϵ ) ≥ 1 − β Pr oof. This follows from Theorems 1 and 2. Remark 1. If x t ∈ V g lobal ∩ V local , the problem evolves into choosing between “experts” (or conﬁdence of pr obabilistic guarantees), which we defer to futur e work. B. T ier 3: Finite-time Recovery If x t / ∈ ( V g lobal ∪ V local ) , no safety guarantees can be made. Thus, the best course of action is safe recovery via reachability warm-started MPPI. The forward RA set of the current state x t , V recov er , should be computed. If V recov er ∩ V local  = ∅ or V recov er ∩ V g lobal  = ∅ , then there exists a trajectory at time step t such that recovery to V local or V g lobal is achieved. V . E X P E R I M E N T S A N D R E S U LT S A. Drone Racing Case Study W e consider a drone racing example as deﬁned in [9], a challenging high-dimensional testbed with dynamic inter- actions and safety-critical constraints, where the ego agent must safely overtak e an opponent to pass through the gate ﬁrst. For computational efﬁcienc y , we set β = 0 . 001 and ϵ = 0 . 1 for both global and local veriﬁcation, requiring 158 samples each. The reason for choosing this ϵ value is to reduce conserv atism and sample complexity; smaller values can be used at the cost of increased samples and computation. 1) Joint System Dynamics: W e consider a 12D non- cooperativ e racing scenario inv olving an ego drone ( e ) and an opponent drone ( o ). The state of each drone i ∈ { e, o } is deﬁned by its 3D position and velocity: x i t = [ p i x,t , v i x,t , p i y ,t , v i y ,t , p i z ,t , v i z ,t ] ∈ R 6 . The joint system state is deﬁned as the concatenation x t = [ x e t , x o t ] ∈ X ⊆ R 12 . The i -th drone is modeled by double-integrator dynamics, with control input u i t = [ a i x,t , a i y ,t , a i z ,t ] , satisfying   u i t   ∞ ≤ ϵ u := 1 m/s 2 . W e assume the opponent has bounded control authority and follows an LQR controller that driv es it tow ard the gate aggressi vely , with high speed. 2) T arg et Set: The mission objective is to pass through a narrow gate centered at [0 , 0 , 0] with width w and height h . Howe ver , for the purpose of reachability , the tar get set for the ego drone is deﬁned as T =  x : p e y − p o y > 0 , v e y − v o y > 0 , | p e x | < 0 . 3 , | p e z | < 0 . 3  . (14) This target set requires the ego drone to be ahead of and faster than the opponent while staying within a corridor for gate passage (radius 0.3 m). 3) Safety Constraints: T o ensure safe ﬂight, the ego drone should av oid the area affected by the downward airﬂo w from the opponent drone, leading to the following noncon ve x constraint, which is challenging to satisfy:      p e x,t − p o x,t p e y ,t − p o y ,t      2 2 >  1 + max( p o z ,t − p e z ,t , 0)  × 0 . 2 , (15) where the collision avoidance region between the ego and opponent drone grows as their height difference increases. Moreov er , to ensure the ego drone passes through the gate without colliding with its boundaries, we impose the follo w- ing constraints as in [9]: ± p e x,t − p e y ,t > − 0 . 05 , ± p e z ,t − p e y ,t > − 0 . 05 . (16) B. Hypothesis 1: The r eachability-based guidance of our method leads to a higher success rate than soft- and har d- constrained model-based controller s or RL policies. W e empirically compare our framew ork to various safe baseline methods: MPPI with a control barrier function (CBF) safety ﬁlter , MPPI with safe soft constraints, and proximal policy optimization (PPO) with a CBF safety ﬁlter . W e sample 500 initial conditions in a bounded region of the state space and roll out the trajectories. Success is quantiﬁed by the ego safely reaching the gate before the opponent, and our frame work achie ves a higher rate of success than its safe counterparts as shown in Figure 3(a). Qualitativ ely , baseline methods tend to prioritize goal-reaching at the expense of safety , whereas our method demonstrates more robust collision avoidance during overtaking (Fig. 2). C. Hypothesis 2: The hierar chical contr ol framework fea- tur es a better combination of MPPI and learning. Similar to Section V -B, we conduct an empirical study comparing our framework to baseline methods: Hybrid Ab- lation which conducts switching but replaces the reachability policy with MPPI, pure MPPI with no safety consideration, MPPI warm-started with the reachability policy , and the reachability policy only . W e select these baselines to observe the interaction between MPPI, the reachability policy , and the switching mechanism. W e sample 500 initial conditions in a bounded region of the state space and roll out the trajectories. The results in Figure 3(b) highlight the necessity of both the switching mechanism and the reachability policy as none of the other baselines achie ved the same rate of success as our hierarchical frame work. Notably , the reachability policy prioritizes reaching the target set but does not guarantee maintaining the system within it, a limitation inherent to reachability theory , as also observ ed in [28]. This explains its lower success rate when used alone. V I . C O N C L U S I O N S In this work, we propose an efﬁcient hierarchical proba- bilistic veriﬁcation framework for reachability learning that uniﬁes global, local, ofﬂine, and online veriﬁcation. By combining a coarse global certiﬁcate via scenario optimiza- tion with local con vex reﬁnement, the approach reduces conservatism while focusing computation on critical regions. −0.5 0.0 0.5 x [m] −2.5 −2.0 −1.5 −1.0 −0.5 0.0 y [m] t = 0.0s −0.5 0.0 0.5 x [m] t = 0.7s −0.5 0.0 0.5 x [m] t = 1.1s −0.5 0.0 0.5 x [m] t = 1.4s −0.5 0.0 0.5 x [m] t = 1.8s Ego (Hybrid) Ego (MPPI (Unsafe)) Ego (MPPI + CBF) Ego (MPPI + Safe Cost) Ego (PPO + CBF) Opponent Gate Refinement Safe Set Fig. 2. Evolution of a drone racing simulation against an opponent (black), comparing our method (Hybrid, dark blue) with various baselines. Baselines struggle to balance safety and goal-reaching, whereas our method achieves both, safely overtaking and reaching the gate. Moreover , we observe that the ego agent lev erages safe recovery MPPI to try to enter the locally reﬁned set. 0 50 100 Success Rate (%) Hybrid (ours) MPPI + CBF MPPI + Safe PPO + CBF 90.2% 62.4% 63.2% 14.2% (a) Safety-Critical Baselines 0 50 100 Success Rate (%) Hybrid (ours) Hybrid Ablation (no reach.) MPPI (no safety) MPPI + Reach. W armstart Reachability Policy Only 90.2% 68.4% 63.4% 62.8% 51.8% (b) Learning & Ablation Study Fig. 3. (a) Performance of our framework compared to control barrier function (CBF) and soft-constraint baselines. (b) Ablation study comparing the hybrid method to its learning-only and model-based variants. In both (a) and (b), the results for our method are shown for reference. Our framework achieves higher success pr obability over 500 sampled initial conditions, effectively balancing collision avoidance and goal-reaching . A switching mechanism between learned and model-based controllers ensures safe and efﬁcient control. The framew ork provides probabilistic safety guarantees and improv es per- formance in safety-critical tasks, as demonstrated in drone racing simulations. Future work includes hardware validation and extension to more complex multi-agent settings and uncertain real-world en vironments. R E F E R E N C E S [1] K. P . W abersich, A. J. T aylor, J. J. Choi, K. Sreenath, C. J. T om- lin, A. D. Ames, and M. N. Zeilinger, “Data-driven safety ﬁlters: Hamilton-jacobi reachability , control barrier functions, and predictive methods for uncertain systems, ” IEEE Control Systems Magazine , vol. 43, no. 5, pp. 137–177, 2023. [2] C. T omlin, G. J. Pappas, and S. Sastry , “Conﬂict resolution for air trafﬁc management: A study in multiagent hybrid systems, ” IEEE T ransactions on automatic control , vol. 43, no. 4, pp. 509–521, 2002. [3] J. L ygeros, “On reachability and minimum cost optimal control, ” Automatica , vol. 40, no. 6, pp. 917–927, 2004. [4] I. M. Mitchell, A. M. Bayen, and C. J. T omlin, “ A time-dependent hamilton-jacobi formulation of reachable sets for continuous dynamic games, ” IEEE T ransactions on automatic contr ol , vol. 50, no. 7, pp. 947–957, 2005. [5] S. Bansal, M. Chen, S. Herbert, and C. J. T omlin, “Hamilton-jacobi reachability: A brief overvie w and recent advances, ” in 2017 IEEE 56th annual confer ence on decision and control (CDC) , pp. 2242– 2253, IEEE, 2017. [6] S. Bansal and C. J. T omlin, “Deepreach: A deep learning approach to high-dimensional reachability , ” in 2021 IEEE International Con- fer ence on Robotics and Automation (ICRA) , pp. 1817–1824, IEEE, 2021. [7] J. F . Fisac, N. F . Lugov oy , V . Rubies-Royo, S. Ghosh, and C. J. T omlin, “Bridging hamilton-jacobi safety analysis and reinforcement learning, ” in 2019 International Confer ence on Robotics and Automation (ICRA) , pp. 8550–8556, IEEE, 2019. [8] K.-C. Hsu, V . Rubies-Royo, C. J. T omlin, and J. F . Fisac, “Safety and liveness guarantees through reach-av oid reinforcement learning, ” arXiv preprint arXiv:2112.12288 , 2021. [9] J. Li, D. Lee, J. Lee, K. S. Dong, S. Sojoudi, and C. T omlin, “Certiﬁable reachability learning using a new lipschitz continuous value function, ” IEEE Robotics and Automation Letters , vol. 10, no. 4, pp. 3582–3589, 2025. [10] M. Althoff and B. H. Krogh, “Zonotope b undles for the efﬁcient computation of reachable sets, ” in 2011 50th IEEE conference on decision and contr ol and Eur opean contr ol confer ence , pp. 6814–6821, IEEE, 2011. [11] S. Coogan, “Mixed monotonicity for reachability and safety in dy- namical systems, ” in 2020 59th IEEE Conference on Decision and Contr ol (CDC) , pp. 5074–5085, IEEE, 2020. [12] H. Hu, M. Fazlyab, M. Morari, and G. J. Pappas, “Reach-sdp: Reach- ability analysis of closed-loop systems with neural network controllers via semideﬁnite programming, ” in 2020 59th IEEE conference on decision and control (CDC) , pp. 5929–5934, IEEE, 2020. [13] T . Lew and M. Pa vone, “Sampling-based reachability analysis: A random set theory approach with adversarial sampling, ” in Conference on r obot learning , pp. 2055–2070, PMLR, 2021. [14] M. Abate and S. Coogan, “Robustly forward inv ariant sets for mixed- monotone systems, ” IEEE Tr ansactions on Automatic Control , vol. 67, no. 9, pp. 4947–4954, 2022. [15] L. Liebenwein, C. Baykal, I. Gilitschenski, S. Karaman, and D. Rus, “Sampling-based approximation algorithms for reachability analysis with prov able guarantees, ” in Robotics: Sciences and Systems , 2018. [16] A. Devonport and M. Arcak, “Estimating reachable sets with scenario optimization, ” in Learning for dynamics and contr ol , pp. 75–84, PMLR, 2020. [17] M. E. Cao, M. Bloch, and S. Coogan, “Estimating high probability reachable sets using gaussian processes, ” in 2021 60th IEEE Confer- ence on Decision and Contr ol (CDC) , pp. 3881–3886, IEEE, 2021. [18] A. Devonport, F . Y ang, L. El Ghaoui, and M. Arcak, “Data-driv en reachability analysis with christoffel functions, ” in 60th IEEE Confer- ence on Decision and Contr ol (CDC) , pp. 5067–5072, IEEE, 2021. [19] A. Lin and S. Bansal, “Generating formal safety assurances for high- dimensional reachability , ” in 2023 IEEE International Conference on Robotics and Automation (ICRA) , pp. 10525–10531, IEEE, 2023. [20] L. Lindemann, M. Clea veland, G. Shim, and G. J. Pappas, “Safe planning in dynamic environments using conformal prediction, ” IEEE Robotics and Automation Letters , vol. 8, no. 8, pp. 5116–5123, 2023. [21] A. Lin and S. Bansal, “V eriﬁcation of neural reachable tubes via sce- nario optimization and conformal prediction, ” in 6th Annual Learning for Dynamics & Control Conference , pp. 719–731, PMLR, 2024. [22] A. Lin, A. Pinto, and S. Bansal, “Rob ust veriﬁcation of controllers under state uncertainty via hamilton-jacobi reachability analysis, ” arXiv preprint arXiv:2511.14755 , 2025. [23] E. Dietrich, R. Dev onport, S. Tu, and M. Arcak, “Data-driv en reach- ability with scenario optimization and the holdout method, ” in IEEE 64th Confer ence on Decision and Contr ol (CDC) , pp. 3925–3931, IEEE, 2025. [24] M. C. Campi and S. Garatti, Introduction to the scenario appr oach . SIAM, 2018. [25] K. Margellos and J. Lygeros, “Hamilton–jacobi formulation for reach– av oid differential games, ” IEEE T ransactions on automatic contr ol , vol. 56, no. 8, pp. 1849–1861, 2011. [26] M. C. Campi, S. Garatti, and M. Prandini, “The scenario approach for systems and control design, ” Annual Reviews in Contr ol , vol. 33, no. 2, pp. 149–157, 2009. [27] G. Williams, P . Drews, B. Goldfain, J. M. Rehg, and E. A. Theodorou, “ Aggressive driving with model predictive path integral control, ” in IEEE international conference on robotics and automation (ICRA) , pp. 1433–1440, IEEE, 2016. [28] G. Chenev ert, J. Li, S. Bae, D. Lee, et al. , “Solving reach-av oid-stay problems using deep deterministic policy gradients, ” Accepted by IEEE international conference on robotics and automation (ICRA) , 2026.

From Global to Local: Hierarchical Probabilistic Verification for Reachability Learning

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment