Probabilistic Verification and Reachability Analysis of Neural Networks via Semidefinite Programming

Pr obabilistic V eriﬁcation and Reachability Analysis of Neural Networks via Semideﬁnite Pr ogramming Mahyar Fazlyab, Manfred Morari, George J. P appas Abstract — Quantifying the rob ustness of neural networks or verifying their safety properties against input uncertainties or adversarial attacks have become an important research area in learning-enabled systems. Most results concentrate around the worst-case scenario where the input of the neural network is perturbed within a norm-bounded uncertainty set. In this paper , we consider a probabilistic setting in which the uncertainty is random with known ﬁrst two moments. In this context, we discuss two relev ant problems: (i) probabilistic safety veriﬁcation, in which the goal is to ﬁnd an upper bound on the probability of violating a safety speciﬁcation; and (ii) conﬁdence ellipsoid estimation, in which given a conﬁdence ellipsoid for the input of the neural network, our goal is to compute a conﬁdence ellipsoid for the output. Due to the presence of nonlinear activation functions, these two problems are very difﬁcult to solve exactly . T o simplify the analysis, our main idea is to abstract the nonlinear activation functions by a combination of afﬁne and quadratic constraints they impose on their input-output pairs. W e then show that the safety of the abstracted network, which is sufﬁcient for the safety of the original network, can be analyzed using semideﬁnite programming . W e illustrate the performance of our appr oach with numerical experiments. I . I N T R OD U C T I O N Neural Networks (NN) hav e been very successful in vari- ous applications such as end-to-end learning for self-driving cars [1], learning-based controllers in robotics [2], speech recognition, and image classiﬁers. Their vulnerability to in- put uncertainties and adversarial attacks, howe ver , refutes the deployment of neural networks in safety critical applications. In the context of image classiﬁcation, for example, it has been sho wn in sev eral works [3]–[5] that even adding an imperceptible noise to the input of neural network-based classiﬁers can completely change their decision. In this con- text, veriﬁcation refers to the process of checking whether the output of a trained NN satisﬁes certain desirable properties when its input is perturbed within an uncertainty model. More precisely , we would like to verify whether the neural network’ s prediction remains the same in a neighborhood of a test point x ? . This neighborhood can represent, for example, the set of input examples that can be crafted by an adversary . In worst-case safety veriﬁcation, we assume that the input uncertainty is bounded and we verify a safety property for all possible perturbations within the uncertainty set. This approach has been pursued e xtensively in se veral works using various tools, such as mixed-integer linear programming [6]–[8], robust optimization and duality theory [9], [10], † Corresponding author: mahyarfa@seas.upenn.edu. This work was sup- ported by D ARP A Assured Autonomy and NSF CPS 1837210. The authors are with the Department of Electrical and Systems Engineering, Univ ersity of Pennsylvania. Email: { mahyarfa, morari, pappasg } @seas.upenn.edu. Satisﬁability Modulo Theory (SMT) [11], dynamical systems [12], [13], Robust Control [14], Abstract Interpretation [15] and many others [16], [17]. In pr obabilistic veriﬁcation, on the other hand, we as- sume that the input uncertainty is random but potentially unbounded. Random uncertainties can emerge as a result of, for e xample, data quantization, input preprocessing, and en- vironmental background noises [18]. In contrast to the worst- case approach, there are only few works that hav e studied veriﬁcation of neural networks in probabilistic settings [18]– [20]. In situations where we hav e random uncertainty mod- els, we ask a related question: “Can we provide statistical guarantees on the output of neural networks when their input is perturbed with a random noise?” In this paper , we provide an af ﬁrmativ e answer by addressing two related problems: • Probabilistic V eriﬁcation: Gi ven a safe re gion in the output space of the neural network, our goal is estimate the probability that the output of the neural network will be in the safe region when its input is perturbed by a random variable with a known mean and covariance. • Conﬁdence pr opagation: Giv en a conﬁdence ellipsoid on the input of the neural network, we want to estimate the output conﬁdence ellipsoid. The rest of the paper is organized as follows. In Sec- tion II, we discuss safety veriﬁcation of neural networks in both deterministic and probabilistic settings. In Section III, we provide an abstraction of neural networks using the formalism of quadratic constraints. In Section IV we dev elop a conv ex relaxation to the problem of conﬁdence ellipsoid estimation. In Section V, we present the numerical experiments. Finally , we draw our conclusions in Section VI. A. Notation and Preliminaries W e denote the set of real numbers by R , the set of real n -dimensional vectors by R n , the set of m × n -dimensional matrices by R m × n , and the n -dimensional identity matrix by I n . W e denote by S n , S n + , and S n ++ the sets of n - by- n symmetric, positive semideﬁnite, and positiv e deﬁnite matrices, respecti vely . W e denote ellipsoids in R n by E ( x c , P ) = { x | ( x − x c ) > P − 1 ( x − x c ) ≤ 1 } , where x c ∈ R n is the center of the ellipsoid and P ∈ S n ++ determines its orientation and volume. W e denote the mean and cov ariance of a random variable X ∈ R n by E [ X ] ∈ R n and Co v [ X ] ∈ S n + , respecti vely . I I . S A F E T Y V E R I FI C A T I O N O F N E U R A L N E T W O R K S A. Deterministic Safety V eriﬁcation Consider a multi-layer feed-forward fully-connected neu- ral network described by the following equations, x 0 = x (1) x k +1 = φ ( W k x k + b k ) k = 0 , · · · , ` − 1 f ( x ) = W ` x ` + b ` , where x 0 = x is the input to the network, W k ∈ R n k +1 × n k , b k ∈ R n k +1 are the weight matrix and bias vector of the k -th layer . The nonlinear activ ation function φ ( · ) (Rectiﬁed Linear Unit (ReLU), sigmoid, tanh, leaky ReLU, etc.) is applied coordinate-wise to the pre-acti vation vectors, i.e., it is of the form φ ( x ) = [ ϕ ( x 1 ) · · · ϕ ( x d )] > , (2) where ϕ is the activ ation function of each individual neuron. Although our framework is applicable to all acti vation func- tions, we focus our attention to ReLU activ ation functions, ϕ ( x ) = max( x, 0) . In deterministic safety v eriﬁcation, we are gi ven a bounded set X ⊂ R n x of possible inputs (the uncertainty set), which is mapped by the neural network to the output reachable set f ( X ) . The desirable properties that we w ould like to verify can often be described by a set S ⊂ R n y in the output space of the neural network, which we call the safe region. In this context, the network is safe if f ( X ) ⊆ S . B. Probabilistic Safety V eriﬁcation In a deterministic setting, reachability analysis and safety veriﬁcation is a yes/no problem whose answer does not quantify the proportion of inputs for which the safety is violated. Furthermore, if the uncertainty is random and po- tentially unbounded, the output f ( x ) would satisfy the safety constraint only with a certain probability . More precisely , giv en a safe region S in the output space of the neural network, we are interested in ﬁnding the probability that the neural network maps the random input X to the safe region, Pr ( f ( X ) ∈ S ) . Since f ( x ) is a nonlinear function, computing the distribution of f ( X ) given the distribution of X is prohibiti ve, except for special cases. As a result, we settle for providing a lower bound p ∈ (0 , 1) on the desired probability , Pr ( f ( X ) ∈ S ) ≥ p. T o compute the lo wer bound, we adopt a geometrical ap- proach, in which we verify whether the reachable set of a conﬁdence region of the input lies entirely in the safe set S . W e ﬁrst recall the deﬁnition of a conﬁdence re gion. Deﬁnition 1 (Conﬁdence region) The p -level ( p ∈ [0 , 1] ) conﬁdence r egion of a vector random variable X ∈ R n is deﬁned as any set E p ⊆ R n for which Pr ( X ∈ E p ) ≥ p. Although conﬁdence regions can hav e different represen- tations, our particular focus in this paper is on ellipsoidal conﬁdence regions. Due to their appealing geometric prop- erties (e.g., in variance to afﬁne subspace transformations), ellipsoids are widely used in robust control to compute reachable sets [21]–[23]. The next two lemmas characterize conﬁdence ellipsoids for Gaussian random v ariables and random variables with known ﬁrst two moments. Lemma 1 Let X ∼ N ( µ, Σ) be an n -dimensional Gaussian random variable . Then the p -level conﬁdence r egion of X is given by the ellipsoid E p = { x | ( x − µ ) > Σ − 1 ( x − µ ) ≤ χ 2 n ( p ) } , (3) wher e χ 2 n ( p ) is the quantile function of the chi-squar ed distribution with n degr ees of freedom. For non-Gaussian random variables, we can use Cheby- shev’ s inequality to characterize the conﬁdence ellipsoids, if we kno w the ﬁrst two moments. Lemma 2 Let X be an n -dimensional random variable with E [ X ] = µ and Co v [ X ] = Σ . Then the ellipsoid E p = { x | ( x − µ ) > Σ − 1 ( x − µ ) ≤ n 1 − p } , (4) is a p -level conﬁdence re gion of X . Lemma 3 Let E p be a conﬁdence re gion of a r andom vari- able X . If f ( E p ) ⊆ S , then S is a p -level conﬁdence re gion for the random variable f ( X ) , i.e., Pr ( f ( X ) ∈ S ) ≥ p . Pr oof: The inclusion f ( E p ) ⊆ S implies Pr ( f ( X ) ∈ S ) ≥ Pr ( f ( X ) ∈ f ( E p )) . Since f is not necessarily a one- to-one mapping, we have Pr ( f ( X ) ∈ f ( E p )) ≥ Pr ( X ∈ E p ) ≥ p . Combining the last two inequalities yields the desired result. According to Lemma 3, if we can certify that the output reachable set f ( E p ) lies entirely in the safe set S for some p ∈ (0 , 1) , then the network is safe with probability at least p . In particular , ﬁnding the best lower bound corresponds to the non-con vex optimization problem, maximize p subject to f ( E p ) ⊆ S , (5) with decision variable p ∈ [0 , 1) . By Lemma 3, the optimal solution p ? then satisﬁes Pr ( f ( X ) ∈ S ) ≥ p ? . (6) C. Conﬁdence Pr opagation A closely related problem to probabilistic safety veriﬁ- cation is conﬁdence propagation. Explicitly , giv en a p -lev el conﬁdence region E p of the input of a neural network, our goal is to ﬁnd a p -le vel conﬁdence region for the output. T o see the connection to the probabilistic veriﬁcation problem, let S be any outer approximation of the output reachable set, i.e., f ( E p ) ⊆ S . By lemma 3, S is a p -level conﬁdence E p AAAB+HicbVBNS8NAFHypX7V+NOrRy2IreCpJPejBQ0EEjxVsLbQhbLabdulmE3Y3Qg39JV48KOLVn+LNf+OmzUFbBxaGmfd4sxMknCntON9WaW19Y3OrvF3Z2d3br9oHh10Vp5LQDol5LHsBVpQzQTuaaU57iaQ4Cjh9CCbXuf/wSKVisbjX04R6ER4JFjKCtZF8u1ofRFiPCebZzcxP6r5dcxrOHGiVuAWpQYG2b38NhjFJIyo04Vipvusk2suw1IxwOqsMUkUTTCZ4RPuGChxR5WXz4DN0apQhCmNpntBorv7eyHCk1DQKzGSeUi17ufif1091eOllTCSppoIsDoUpRzpGeQtoyCQlmk8NwUQykxWRMZaYaNNVxZTgLn95lXSbDfe84d41a62roo4yHMMJnIELF9CCW2hDBwik8Ayv8GY9WS/Wu/WxGC1Zxc4R/IH1+QM4GZLG f ( E p ) AAAB+3icbVDLSsNAFL2pr1pfsS7dDLZC3ZSkLnThoiCCywq2FtoQJtNJO3TyYGYilpBfceNCEbf+iDv/xkmbhbYeGDiccy/3zPFizqSyrG+jtLa+sblV3q7s7O7tH5iH1Z6MEkFol0Q8En0PS8pZSLuKKU77saA48Dh98KbXuf/wSIVkUXivZjF1AjwOmc8IVlpyzWrdbwwDrCYE8/Qmc+OzumvWrKY1B1oldkFqUKDjml/DUUSSgIaKcCzlwLZi5aRYKEY4zSrDRNIYkyke04GmIQ6odNJ59gydamWE/EjoFyo0V39vpDiQchZ4ejKPKZe9XPzPGyTKv3RSFsaJoiFZHPITjlSE8iLQiAlKFJ9pgolgOisiEywwUbquii7BXv7yKum1mvZ5075r1dpXRR1lOIYTaIANF9CGW+hAFwg8wTO8wpuRGS/Gu/GxGC0Zxc4R/IHx+QPLdpOb f AAAB6nicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstLAgsbHEKEgCF7K37MGGvb3L7pwJufATbCw0xtZfZOe/cYErFHzJJC/vzWRmXpBIYdB1v53C2vrG5lZxu7Szu7d/UD48aps41Yy3WCxj3Qmo4VIo3kKBkncSzWkUSP4YjG9m/uMT10bE6gEnCfcjOlQiFIyile6rYbVfrrg1dw6ySrycVCBHs1/+6g1ilkZcIZPUmK7nJuhnVKNgkk9LvdTwhLIxHfKupYpG3PjZ/NQpObPKgISxtqWQzNXfExmNjJlEge2MKI7MsjcT//O6KYZXfiZUkiJXbLEoTCXBmMz+JgOhOUM5sYQyLeythI2opgxtOiUbgrf88ipp12veRc27q1ca13kcRTiBUzgHDy6hAbfQhBYwGMIzvMKbI50X5935WLQWnHzmGP7A+fwBgSaNQQ== Fig. 1: p -le vel input conﬁdence ellipsoid E p , its image f ( E p ) , and the estimated output conﬁdence ellipsoid. region for the output. Of course, there is an inﬁnite number of such possible conﬁdence regions. Our goal is ﬁnd the “best” conﬁdence region with respect to some metric. Using the volume of the ellipsoid as an optimization criterion, the best conﬁdence region amounts to solving the problem minimize V olume( S ) subject to f ( E p ) ⊆ S . (7) The solution to the above problem provides the p -level conﬁ- dence region with the minimum volume. Figure 1 illustrates the procedure of conﬁdence estimation. In the next section, we provide a con ve x relaxation of the optimization problem (7). The other problem in (5) is a straightforward extension of conﬁdence estimation, and hence, we will not discuss the details. I I I . P R O BL E M R E L A X A T I O N V I A Q UA D R AT I C C O N S T R A I N T S Due to the presence of nonlinear activ ation functions, checking the condition f ( E p ) ⊆ S in (5) or (7) is a non- con vex feasibility problem and is NP-hard, in general. Our main idea is to abstract the original network f by another network ˜ f in the sense that ˜ f o ver -approximates the output of the original network for any input ellipsoid, i.e., f ( E p ) ⊆ ˜ f ( E p ) for any p ∈ [0 , 1) . Then it will be sufﬁcient to verify the safety properties of the relaxed network, i.e., verify the inclusion ˜ f ( E p ) ⊆ S . In the following, we use the frame work of quadratic constraints to de velop such an abstraction. A. Relaxation of Nonlinearities by Quadratic Constraints In this subsection, we sho w how we can abstract activ a- tion functions, and in particular the ReLU function, using quadratic constraints. W e ﬁrst provide a formal deﬁnition, introduced in [14]. Deﬁnition 2 Let φ : R d → R d be and suppose Q ⊂ S 2 d +1 is the set of all symmetric and indeﬁnite matrices Q such that the inequality   x φ ( x ) 1   > Q   x φ ( x ) 1   ≥ 0 , (8) holds for all x ∈ R d . Then we say φ satisﬁes the quadratic constraint deﬁned by Q . Note that the matrix Q in Deﬁnition 2 is indeﬁnite, or otherwise, the constraint trivially holds. Before deriving QCs for the ReLU function, we recall some deﬁnitions, which can be found in many references; for example [24], [25]. Deﬁnition 3 (Sector-bounded nonlinearity) A nonlinear function ϕ : R → R is sector-bounded on the sector [ α, β ] ( 0 ≤ α ≤ β ) if the following condition holds for all x , ( ϕ ( x ) − αx )( ϕ ( x ) − β x ) ≤ 0 . (9) Deﬁnition 4 (Slope-restricted nonlinearity) A nonlinear function ϕ ( x ) : R → R is slope-restricted on [ α, β ] ( 0 ≤ α ≤ β ) if for any ( x, ϕ ( x )) and ( x ? , ϕ ( x ? )) , ( ϕ ( x ) − ϕ ( x ? ) − α ( x − x ? ))( ϕ ( x ) − ϕ ( x ? ) − β ( x − x ? )) ≤ 0 . (10) Repeated nonlinearities. Assuming that the same activ a- tion function is used in all neurons, we can exploit this structure to reﬁne the QC abstraction of the nonlinearity . Explicitly , suppose ϕ : R → R is slope-restricted on [ α, β ] and let φ ( x ) = [ ϕ ( x 1 ) · · · ϕ ( x d )] > be a vector -valued function constructed by component-wise repetition of ϕ . It is not hard to verify that φ is also slope-restricted in the same sector . Howe ver , this representation simply ignores the fact that all the nonlinearities that compose φ are the same. By taking adv antage of this structure, we can reﬁne the quadratic constraint that describes φ . T o be speciﬁc, for an input-output pair ( x, φ ( x )) , x ∈ R d , we can write the slope-restriction condition ( ϕ ( x i ) − ϕ ( x j ) − α ( x i − x j ))( ϕ ( x i ) − ϕ ( x j ) − β ( x i − x j )) ≤ 0 , (11) for all distinct i, j . This particular QC can tighten the relaxation incurred by the QC abstraction of the nonlinearity . There are sev eral results in the literature about repeated nonlinearities. For instance, in [25], [26], the authors deri ve QCs for repeated and odd nonlinearities (e.g. tanh function). B. QC for ReLU function In this subsection, we derive quadratic constraints for the ReLU function, φ ( x ) = max(0 , x ) , x ∈ R d . Note that this function lies on the boundary of the sector [0 , 1] . More precisely , we can describe the ReLU function by three quadratic and/or af ﬁne constraints: y i = max(0 , x i ) ⇔ y i ≥ x i , y i ≥ 0 , y 2 i = x i y i . (12) On the other hand, for any two distinct indices i 6 = j , we can write the constraint (11) with α = 0 , and β = 1 , ( y j − y i ) 2 ≤ ( y j − y i )( x j − x i ) . (13) By adding a weighted combination of all these constraints (positiv e weights for inequalities), we ﬁnd that the ReLU function y = max(0 , x ) satisﬁes d X i =1 λ i ( y 2 i − x i y i ) + ν i ( y i − x i ) + η i y i − (14) X i 6 = j λ ij  ( y j − y i ) 2 − ( y j − y i )( x j − x i )  ≥ 0 , for any multipliers ( λ i , ν i , η i , λ ij ) ∈ R × R 3 + for i, j ∈ { 1 , · · · , d } . This inequality can be written in the compact form (8), as stated in the follo wing lemma. Lemma 4 (QC for ReLU function) The ReLU function, φ ( x ) = max(0 , x ) : R d → R d , satisﬁes the QC deﬁned by Q wher e Q =    Q | Q =   0 T − ν T − 2 T ν + η − ν > ν > + η > 0      . (15) Her e η , ν ≥ 0 and T ∈ S d + is given by T = d X i =1 λ i e i e > i + d − 1 X i =1 d X j >i λ ij ( e i − e j )( e i − e j ) > , wher e e i is the i -th basis vector in R d and λ ij ≥ 0 . Pr oof: See [14]. Lemma 4 characterizes a family of valid QCs for the ReLU function. It is not hard to verify that the set Q of valid QCs is a con ve x cone. As we will see in the next section, the matrix Q in (15) appears as a decision variable in the optimization problem. C. T ightening the Relaxation In the previous subsection, we derived QCs that are valid for the whole space R d . When restricted to a region R ⊆ R d , we can tighten the QC relaxation. Consider the relationship φ ( x ) = max(0 , x ) , x ∈ R ⊆ R d and let I + , and I − be the set of neurons that are always acti ve or always inacti ve, i.e., I + = { i | x i ≥ 0 for all x ∈ R} (16) I − = { i | x i < 0 for all x ∈ R} . The constraint y i ≥ x i holds with equality for acti ve neurons. Therefore, we can write ν i ∈ R if i ∈ I + , ν i ≥ 0 otherwise . Similarly , the constraint y i ≥ 0 holds with equality for inactiv e neurons. Therefore, we can write η i ∈ R if i ∈ I − , η i ≥ 0 otherwise . Finally , it can be veriﬁed that the cross-coupling constraint in (13) holds with equality for pairs of always active or always inactiv e neurons. Therefore, for any 1 ≤ i < j ≤ d , we can write λ ij ∈ R if ( i, j ) ∈ I + × I + or ( i, j ) ∈ I − × I − λ ij ≥ 0 otherwise . These additional degrees of freedom on the multipliers can tighten the relaxation incurred in (14). Note that the set of activ e or inactive neurons are not kno wn a priori . Ho wever , we can partially ﬁnd them using, for example, interval arithmetic. I V . A N A L Y S I S O F T H E R E L A X E D N E T W O R K V I A S E M I D E FI N I T E P R O G R A M M I N G In this section, we use the QC abstraction developed in the previous section to analyze the safety of the relaxed network. In the next theorem, we state our main result for one-layer neural networks and will discuss the multi-layer case in Section IV -A. Theorem 1 (Output covering ellipsoid) Consider a one- layer neural network f : R n x → R n y described by the equation y = W 1 φ ( W 0 x + b 0 ) + b 1 , (17) wher e φ : R n 1 → R n 1 satisﬁes the quadratic constraint deﬁned by Q , i.e., for any Q ∈ Q ,   z φ ( z ) 1   > Q   z φ ( z ) 1   ≥ 0 for all z . (18) Suppose x ∈ E ( µ x , Σ x ) . Consider the following matrix inequality M 1 + M 2 + M 3  0 , (19) wher e M 1 =   I n x 0 0 0 0 1   P ( τ )   I n x 0 0 0 0 1   > M 2 =    W 0 > 0 0 0 I n 1 0 b 0 > 0 1    Q   W 0 0 b 0 0 I n 1 0 0 0 1   M 3 =    0 0 W 1 > 0 b 1 > 1    S ( A, b )  0 W 1 b 1 0 0 1  with P ( τ ) = τ  − Σ − 1 x Σ − 1 x µ x µ > x Σ − 1 x − µ > x Σ − 1 x µ x + 1  S ( A, b ) =  A 2 Ab b > A b > b − 1  . If (19) is feasible for some ( τ , A, Q, b ) ∈ R + × S n y × Q × R n y , then y ∈ E ( µ y , Σ y ) with µ y = − A − 1 b and Σ y = A − 2 . Pr oof: W e ﬁrst introduce the auxiliary variable z , and rewrite the equation of the neural network as z = φ ( W 0 x + b 0 ) y = W 1 z + b 1 . Since φ satisﬁes the QC deﬁned by Q , we can write the following QC from the identity z = φ ( W 0 x + b 0 ) :   W 0 x + b 0 z 1   > Q   W 0 x + b 0 z 1   ≥ 0 , for all Q ∈ Q . (21) By substituting the identity   W 0 x + b 0 z 1   =   W 0 0 b 0 0 I n 1 0 0 0 1     x z 1   , back into (21) and denoting x = [ x > z > ] > , we can write the inequality  x 1  > M 2  x 1  ≥ 0 , (22) for any Q ∈ Q and all x . By deﬁnition, for all x ∈ E ( µ x , Σ x ) , we hav e ( x − µ x ) > Σ − 1 x ( x − µ x ) ≤ 1 , which is equi valent to τ  x 1  >  − Σ − 1 x Σ − 1 x µ x µ > x Σ − 1 x − µ > x Σ − 1 x µ x + 1   x 1  ≥ 0 . By using the identity  x 1  =  I n x 0 0 0 0 1    x z 1   , we conclude that for all x ∈ E ( µ x , Σ x ) , z = φ ( W 0 x + b ) ,  x 1  > M 1  x 1  ≥ 0 . (23) Suppose (19) holds for some ( A, Q, b ) ∈ S n y × Q × R n y . By left- and right- multiplying both sides of (18) by [ x > 1] and [ x > 1] > , respecti vely , we obtain  x 1  > M 1  x 1  +  x 1  > M 2  x 1  +  x 1  > M 3  x 1  ≤ 0 . For an y x ∈ E ( µ x , Σ x ) the ﬁrst two quadratic terms are nonnegati ve by (23) and (22), respectiv ely . Therefore, the last term on the left-hand side must be nonpositi ve for all x ∈ E ( µ x , Σ x ) ,  x 1  > M 3  x 1  ≤ 0 . But the preceding inequality , using the relation y = W 1 z + b 1 , is equi valent to  y 1  >  A 2 Ab b > A b > b − 1   y 1  ≤ 0 , which is equivalent to ( y + A − 1 b ) > A 2 ( y + A − 1 b ) ≤ 1 . Using our notation for ellipsoids, this means for all x ∈ E ( µ x , Σ x ) , we must hav e y ∈ E ( − A − 1 b, A − 2 ) . In Theorem 1, we proposed a matrix inequality , in variables ( Q, A, b ) , as a sufﬁcient condition for enclos- ing the output of the neural network with the ellipsoid E ( − A − 1 b, A − 2 ) . W e can now use this result to ﬁnd the minimum-volume ellipsoid with this property . Note that the matrix inequality (19) is not linear in ( A, b ) . Ne vertheless, we can con vexify it by using Schur Complements. Lemma 5 The matrix inequality in (19) is equivalent to the linear matrix inequality (LMI) M =     M 1 + M 2 − ee > 0 n x × n y W 1 > A b 1 > A + b > 0 n y × n x AW 1 Ab 1 + b − I n y      0 , (24) in ( τ , A, Q, b ) , where e = (0 , · · · , 0 , 1) ∈ R n x + n 1 +1 . Pr oof: It is not hard to verify that M 3 can be written as M 3 = F F > − ee > , where F , afﬁne in ( A, b ) , is given by F ( A, b ) =    0 n x × n y W 1 > A b 1 > A + b >    . Using this deﬁnition, the matrix inequality in (19) reads ( M 1 + M 2 − ee > ) + F F >  0 , which implies that the term in the parentheses must be non-negati ve, i.e., M 1 + M 2 − ee >  0 . Using Schur Complements, the last two inequalities are equiv alent to (24). Having established Lemma 5, we can no w ﬁnd the minimum-volume cov ering ellipsoid by solving the follo wing semideﬁnite program (SDP), minimize − log det( A ) subject to (24) . (25) where the decision variables are ( τ , A, Q, b ) ∈ R + × S n y × Q × R n y . Since Q is a conv ex cone, (25) is a conv ex program and can be solved via interior-point method solvers. A. Multi-layer Case For multi-layer neural networks, we can apply the result of Theorem 1 in a layer-by-layer fashion provided that the input conﬁdence ellipsoid of each layer is non-degenerate. This assumption holds when for all 0 ≤ k ≤ ` − 1 we hav e n k +1 ≤ n k (reduction in the width of layers), and the weight matrices W k ∈ R n k +1 × n k are full rank. T o see this, we note that ellipsoids are inv ariant under afﬁne subspace transformations such that W k E ( µ k , Σ k ) + b k = E ( W k µ k + b k , W k Σ k W k > ) . This implies that Σ k +1 := W k Σ k W k > is positiv e deﬁnite whenev er Σ k is positi ve deﬁnite, implying that the ellipsoid E ( µ k +1 , Σ k +1 ) is non-de generate. If the assumption n k +1 ≤ n k is violated, we can use the compact representation of multi-layer neural networks elaborated in [14] to arri ve at the multi-layer couterpart of the matrix inequality in (19). V . N U M E R I C A L E X P E R I M E N T S In this section, we consider a numerical experiment, in which we estimate the conﬁdence ellipsoid of a one-layer neural network with n x = 2 inputs, n 1 ∈ { 10 , 30 , 50 } hidden neurons and n y = 2 outputs. W e assume the input is Gaussian with µ x = (1 , 1) and Σ x = diag (1 , 2) . The weights and biases of the network are chosen randomly . W e use MA TLAB, CVX [27], and Mosek [28] to solve the -10 -5 0 -10 -5 0 5 10 n 1 =10 -10 -5 0 -5 0 5 10 n 1 =10 -10 0 10 -25 -20 -15 -10 -5 0 5 n 1 =30 -10 0 10 -25 -20 -15 -10 -5 0 5 n 1 =30 0 5 10 -15 -10 -5 0 n 1 =50 0 5 10 -15 -10 -5 n 1 =50 Fig. 2: T op: the estimated 95% conﬁdence ellipsoid along with 10 4 samples of the output. Bottom: The image of the 95% input conﬁ- dence ellipsoid ( f ( E p ) with p = 0 . 95 ) and its outer approximation (the output conﬁdence ellipsoid). corresponding SDP . In Figure 2, we plot the estimated 0 . 95 - lev el output conﬁdence ellipsoid along with 10 4 sample out- puts. W e also plot the image of 0 . 95 -lev el input conﬁdence ellipsoid under f along with the estimated 0 . 95 -lev el output conﬁdence ellipsoid. V I . C O N C L U S I O N S W e studied probabilistic safety veriﬁcation of neural net- works when their inputs are subject to random noise with known ﬁrst two moments. Instead of analyzing the network directly , we proposed to study the safety of an abstracted network instead, in which the nonlinear activ ation functions are relaxed by the quadratic constraints their input-output pairs satisfy . W e then showed that we can analyze the safety properties of the abstracted network using semideﬁnite pro- gramming. It would be interesting to consider other related problems such as closed-loop statistical safety veriﬁcation and reachability analysis. R E F E R E N C E S [1] M. Bojarski, D. Del T esta, D. Dworakowski, B. Firner , B. Flepp, P . Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, et al. , “End to end learning for self-driving cars, ” arXiv preprint , 2016. [2] G. Shi, X. Shi, M. O’Connell, R. Y u, K. Azizzadenesheli, A. Anand- kumar , Y . Y ue, and S.-J. Chung, “Neural lander: Stable drone landing control using learned dynamics, ” arXiv preprint , 2018. [3] S. Zheng, Y . Song, T . Leung, and I. Goodfellow , “Improving the ro- bustness of deep neural networks via stability training, ” in Proceedings of the ieee conference on computer vision and pattern recognition , pp. 4480–4488, 2016. [4] S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P . Frossard, “Uni- versal adversarial perturbations, ” arXiv pr eprint , 2017. [5] J. Su, D. V . V argas, and K. Sakurai, “One pixel attack for fooling deep neural networks, ” IEEE T ransactions on Evolutionary Computation , 2019. [6] O. Bastani, Y . Ioannou, L. Lampropoulos, D. Vytiniotis, A. Nori, and A. Criminisi, “Measuring neural net robustness with constraints, ” in Advances in neural information pr ocessing systems , pp. 2613–2621, 2016. [7] A. Lomuscio and L. Maganti, “ An approach to reachability analysis for feed-forward relu neural networks, ” arXiv preprint , 2017. [8] V . Tjeng, K. Xiao, and R. T edrake, “Evaluating robustness of neural networks with mixed integer programming, ” arXiv preprint arXiv:1711.07356 , 2017. [9] J. Z. Kolter and E. W ong, “Provable defenses against adversarial examples via the con vex outer adversarial polytope, ” arXiv pr eprint arXiv:1711.00851 , vol. 1, no. 2, p. 3, 2017. [10] K. Dvijotham, R. Stanforth, S. Go wal, T . Mann, and P . Kohli, “ A dual approach to scalable veriﬁcation of deep networks, ” arXiv pr eprint arXiv:1803.06567 , 2018. [11] L. Pulina and A. T acchella, “Challenging smt solvers to verify neural networks, ” AI Communications , vol. 25, no. 2, pp. 117–135, 2012. [12] R. Ivanov , J. W eimer, R. Alur, G. J. Pappas, and I. Lee, “V erisig: verifying safety properties of hybrid systems with neural network controllers, ” arXiv preprint , 2018. [13] W . Xiang, H.-D. Tran, and T . T . Johnson, “Output reachable set estimation and veriﬁcation for multilayer neural networks, ” IEEE transactions on neural networks and learning systems , no. 99, pp. 1–7, 2018. [14] M. Fazlyab, M. Morari, and G. J. Pappas, “Safety veriﬁcation and robustness analysis of neural networks via quadratic constraints and semideﬁnite programming, ” arXiv preprint , 2019. [15] M. Mirman, T . Gehr , and M. V echev , “Differentiable abstract in- terpretation for prov ably robust neural networks, ” in International Confer ence on Machine Learning , pp. 3575–3583, 2018. [16] M. Hein and M. Andriushchenko, “Formal guarantees on the robust- ness of a classiﬁer against adversarial manipulation, ” in Advances in Neural Information Pr ocessing Systems , pp. 2266–2276, 2017. [17] S. W ang, K. Pei, J. Whitehouse, J. Y ang, and S. Jana, “Efﬁcient formal safety analysis of neural networks, ” in Advances in Neural Information Pr ocessing Systems , pp. 6369–6379, 2018. [18] T .-W . W eng, P .-Y . Chen, L. M. Nguyen, M. S. Squillante, I. Oseledets, and L. Daniel, “Pro ven: Certifying robustness of neural networks with a probabilistic approach, ” arXiv pr eprint arXiv:1812.08329 , 2018. [19] K. Dvijotham, M. Garnelo, A. F awzi, and P . K ohli, “V eriﬁcation of deep probabilistic models, ” arXiv pr eprint arXiv:1812.02795 , 2018. [20] A. Bibi, M. Alfadly , and B. Ghanem, “ Analytic expressions for probabilistic moments of pl-dnn with gaussian input, ” in Proceedings of the IEEE Conference on Computer V ision and P attern Recognition , pp. 9099–9107, 2018. [21] K. P . W abersich and M. N. Zeilinger, “Linear model predictive safety certiﬁcation for learning-based control, ” in 2018 IEEE Confer ence on Decision and Control (CDC) , pp. 7130–7135, IEEE, 2018. [22] D. V an Hessem and O. Bosgra, “ A conic reformulation of model predictiv e control including bounded and stochastic disturbances under state and input constraints, ” in Pr oceedings of the 41st IEEE Confer- ence on Decision and Control, 2002. , vol. 4, pp. 4643–4648, IEEE, 2002. [23] M. Cannon, B. Kouv aritakis, S. V . Rakovic, and Q. Cheng, “Stochastic tubes in model predictive control with probabilistic constraints, ” IEEE T ransactions on Automatic Control , vol. 56, no. 1, pp. 194–200, 2011. [24] A. Megretski and A. Rantzer, “System analysis via integral quadratic constraints, ” IEEE T ransactions on Automatic Control , vol. 42, no. 6, pp. 819–830, 1997. [25] F . D’amato, M. A. Rotea, A. Megretski, and U. J ¨ onsson, “New results for analysis of systems with repeated nonlinearities, ” Automatica , vol. 37, no. 5, pp. 739–747, 2001. [26] V . V . Kulkarni and M. G. Safonov , “ All multipliers for repeated monotone nonlinearities, ” IEEE T ransactions on A utomatic Contr ol , vol. 47, no. 7, pp. 1209–1212, 2002. [27] M. Grant, S. Boyd, and Y . Y e, “Cvx: Matlab software for disciplined con vex programming, ” 2008. [28] M. ApS, The MOSEK optimization toolbox for MA TLAB manual. V ersion 8.1. , 2017.

Probabilistic Verification and Reachability Analysis of Neural Networks via Semidefinite Programming

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment