Probabilistic Verification and Reachability Analysis of Neural Networks via Semidefinite Programming
Quantifying the robustness of neural networks or verifying their safety properties against input uncertainties or adversarial attacks have become an important research area in learning-enabled systems. Most results concentrate around the worst-case s…
Authors: Mahyar Fazlyab, Manfred Morari, George J. Pappas
Pr obabilistic V erification and Reachability Analysis of Neural Networks via Semidefinite Pr ogramming Mahyar Fazlyab, Manfred Morari, George J. P appas Abstract — Quantifying the rob ustness of neural networks or verifying their safety properties against input uncertainties or adversarial attacks have become an important research area in learning-enabled systems. Most results concentrate around the worst-case scenario where the input of the neural network is perturbed within a norm-bounded uncertainty set. In this paper , we consider a probabilistic setting in which the uncertainty is random with known first two moments. In this context, we discuss two relev ant problems: (i) probabilistic safety verification, in which the goal is to find an upper bound on the probability of violating a safety specification; and (ii) confidence ellipsoid estimation, in which given a confidence ellipsoid for the input of the neural network, our goal is to compute a confidence ellipsoid for the output. Due to the presence of nonlinear activation functions, these two problems are very difficult to solve exactly . T o simplify the analysis, our main idea is to abstract the nonlinear activation functions by a combination of affine and quadratic constraints they impose on their input-output pairs. W e then show that the safety of the abstracted network, which is sufficient for the safety of the original network, can be analyzed using semidefinite programming . W e illustrate the performance of our appr oach with numerical experiments. I . I N T R OD U C T I O N Neural Networks (NN) hav e been very successful in vari- ous applications such as end-to-end learning for self-driving cars [1], learning-based controllers in robotics [2], speech recognition, and image classifiers. Their vulnerability to in- put uncertainties and adversarial attacks, howe ver , refutes the deployment of neural networks in safety critical applications. In the context of image classification, for example, it has been sho wn in sev eral works [3]–[5] that even adding an imperceptible noise to the input of neural network-based classifiers can completely change their decision. In this con- text, verification refers to the process of checking whether the output of a trained NN satisfies certain desirable properties when its input is perturbed within an uncertainty model. More precisely , we would like to verify whether the neural network’ s prediction remains the same in a neighborhood of a test point x ? . This neighborhood can represent, for example, the set of input examples that can be crafted by an adversary . In worst-case safety verification, we assume that the input uncertainty is bounded and we verify a safety property for all possible perturbations within the uncertainty set. This approach has been pursued e xtensively in se veral works using various tools, such as mixed-integer linear programming [6]–[8], robust optimization and duality theory [9], [10], † Corresponding author: mahyarfa@seas.upenn.edu. This work was sup- ported by D ARP A Assured Autonomy and NSF CPS 1837210. The authors are with the Department of Electrical and Systems Engineering, Univ ersity of Pennsylvania. Email: { mahyarfa, morari, pappasg } @seas.upenn.edu. Satisfiability Modulo Theory (SMT) [11], dynamical systems [12], [13], Robust Control [14], Abstract Interpretation [15] and many others [16], [17]. In pr obabilistic verification, on the other hand, we as- sume that the input uncertainty is random but potentially unbounded. Random uncertainties can emerge as a result of, for e xample, data quantization, input preprocessing, and en- vironmental background noises [18]. In contrast to the worst- case approach, there are only few works that hav e studied verification of neural networks in probabilistic settings [18]– [20]. In situations where we hav e random uncertainty mod- els, we ask a related question: “Can we provide statistical guarantees on the output of neural networks when their input is perturbed with a random noise?” In this paper , we provide an af firmativ e answer by addressing two related problems: • Probabilistic V erification: Gi ven a safe re gion in the output space of the neural network, our goal is estimate the probability that the output of the neural network will be in the safe region when its input is perturbed by a random variable with a known mean and covariance. • Confidence pr opagation: Giv en a confidence ellipsoid on the input of the neural network, we want to estimate the output confidence ellipsoid. The rest of the paper is organized as follows. In Sec- tion II, we discuss safety verification of neural networks in both deterministic and probabilistic settings. In Section III, we provide an abstraction of neural networks using the formalism of quadratic constraints. In Section IV we dev elop a conv ex relaxation to the problem of confidence ellipsoid estimation. In Section V, we present the numerical experiments. Finally , we draw our conclusions in Section VI. A. Notation and Preliminaries W e denote the set of real numbers by R , the set of real n -dimensional vectors by R n , the set of m × n -dimensional matrices by R m × n , and the n -dimensional identity matrix by I n . W e denote by S n , S n + , and S n ++ the sets of n - by- n symmetric, positive semidefinite, and positiv e definite matrices, respecti vely . W e denote ellipsoids in R n by E ( x c , P ) = { x | ( x − x c ) > P − 1 ( x − x c ) ≤ 1 } , where x c ∈ R n is the center of the ellipsoid and P ∈ S n ++ determines its orientation and volume. W e denote the mean and cov ariance of a random variable X ∈ R n by E [ X ] ∈ R n and Co v [ X ] ∈ S n + , respecti vely . I I . S A F E T Y V E R I FI C A T I O N O F N E U R A L N E T W O R K S A. Deterministic Safety V erification Consider a multi-layer feed-forward fully-connected neu- ral network described by the following equations, x 0 = x (1) x k +1 = φ ( W k x k + b k ) k = 0 , · · · , ` − 1 f ( x ) = W ` x ` + b ` , where x 0 = x is the input to the network, W k ∈ R n k +1 × n k , b k ∈ R n k +1 are the weight matrix and bias vector of the k -th layer . The nonlinear activ ation function φ ( · ) (Rectified Linear Unit (ReLU), sigmoid, tanh, leaky ReLU, etc.) is applied coordinate-wise to the pre-acti vation vectors, i.e., it is of the form φ ( x ) = [ ϕ ( x 1 ) · · · ϕ ( x d )] > , (2) where ϕ is the activ ation function of each individual neuron. Although our framework is applicable to all acti vation func- tions, we focus our attention to ReLU activ ation functions, ϕ ( x ) = max( x, 0) . In deterministic safety v erification, we are gi ven a bounded set X ⊂ R n x of possible inputs (the uncertainty set), which is mapped by the neural network to the output reachable set f ( X ) . The desirable properties that we w ould like to verify can often be described by a set S ⊂ R n y in the output space of the neural network, which we call the safe region. In this context, the network is safe if f ( X ) ⊆ S . B. Probabilistic Safety V erification In a deterministic setting, reachability analysis and safety verification is a yes/no problem whose answer does not quantify the proportion of inputs for which the safety is violated. Furthermore, if the uncertainty is random and po- tentially unbounded, the output f ( x ) would satisfy the safety constraint only with a certain probability . More precisely , giv en a safe region S in the output space of the neural network, we are interested in finding the probability that the neural network maps the random input X to the safe region, Pr ( f ( X ) ∈ S ) . Since f ( x ) is a nonlinear function, computing the distribution of f ( X ) given the distribution of X is prohibiti ve, except for special cases. As a result, we settle for providing a lower bound p ∈ (0 , 1) on the desired probability , Pr ( f ( X ) ∈ S ) ≥ p. T o compute the lo wer bound, we adopt a geometrical ap- proach, in which we verify whether the reachable set of a confidence region of the input lies entirely in the safe set S . W e first recall the definition of a confidence re gion. Definition 1 (Confidence region) The p -level ( p ∈ [0 , 1] ) confidence r egion of a vector random variable X ∈ R n is defined as any set E p ⊆ R n for which Pr ( X ∈ E p ) ≥ p. Although confidence regions can hav e different represen- tations, our particular focus in this paper is on ellipsoidal confidence regions. Due to their appealing geometric prop- erties (e.g., in variance to affine subspace transformations), ellipsoids are widely used in robust control to compute reachable sets [21]–[23]. The next two lemmas characterize confidence ellipsoids for Gaussian random v ariables and random variables with known first two moments. Lemma 1 Let X ∼ N ( µ, Σ) be an n -dimensional Gaussian random variable . Then the p -level confidence r egion of X is given by the ellipsoid E p = { x | ( x − µ ) > Σ − 1 ( x − µ ) ≤ χ 2 n ( p ) } , (3) wher e χ 2 n ( p ) is the quantile function of the chi-squar ed distribution with n degr ees of freedom. For non-Gaussian random variables, we can use Cheby- shev’ s inequality to characterize the confidence ellipsoids, if we kno w the first two moments. Lemma 2 Let X be an n -dimensional random variable with E [ X ] = µ and Co v [ X ] = Σ . Then the ellipsoid E p = { x | ( x − µ ) > Σ − 1 ( x − µ ) ≤ n 1 − p } , (4) is a p -level confidence re gion of X . Lemma 3 Let E p be a confidence re gion of a r andom vari- able X . If f ( E p ) ⊆ S , then S is a p -level confidence re gion for the random variable f ( X ) , i.e., Pr ( f ( X ) ∈ S ) ≥ p . Pr oof: The inclusion f ( E p ) ⊆ S implies Pr ( f ( X ) ∈ S ) ≥ Pr ( f ( X ) ∈ f ( E p )) . Since f is not necessarily a one- to-one mapping, we have Pr ( f ( X ) ∈ f ( E p )) ≥ Pr ( X ∈ E p ) ≥ p . Combining the last two inequalities yields the desired result. According to Lemma 3, if we can certify that the output reachable set f ( E p ) lies entirely in the safe set S for some p ∈ (0 , 1) , then the network is safe with probability at least p . In particular , finding the best lower bound corresponds to the non-con vex optimization problem, maximize p subject to f ( E p ) ⊆ S , (5) with decision variable p ∈ [0 , 1) . By Lemma 3, the optimal solution p ? then satisfies Pr ( f ( X ) ∈ S ) ≥ p ? . (6) C. Confidence Pr opagation A closely related problem to probabilistic safety verifi- cation is confidence propagation. Explicitly , giv en a p -lev el confidence region E p of the input of a neural network, our goal is to find a p -le vel confidence region for the output. T o see the connection to the probabilistic verification problem, let S be any outer approximation of the output reachable set, i.e., f ( E p ) ⊆ S . By lemma 3, S is a p -level confidence E p AAAB+HicbVBNS8NAFHypX7V+NOrRy2IreCpJPejBQ0EEjxVsLbQhbLabdulmE3Y3Qg39JV48KOLVn+LNf+OmzUFbBxaGmfd4sxMknCntON9WaW19Y3OrvF3Z2d3br9oHh10Vp5LQDol5LHsBVpQzQTuaaU57iaQ4Cjh9CCbXuf/wSKVisbjX04R6ER4JFjKCtZF8u1ofRFiPCebZzcxP6r5dcxrOHGiVuAWpQYG2b38NhjFJIyo04Vipvusk2suw1IxwOqsMUkUTTCZ4RPuGChxR5WXz4DN0apQhCmNpntBorv7eyHCk1DQKzGSeUi17ufif1091eOllTCSppoIsDoUpRzpGeQtoyCQlmk8NwUQykxWRMZaYaNNVxZTgLn95lXSbDfe84d41a62roo4yHMMJnIELF9CCW2hDBwik8Ayv8GY9WS/Wu/WxGC1Zxc4R/IH1+QM4GZLG f ( E p ) AAAB+3icbVDLSsNAFL2pr1pfsS7dDLZC3ZSkLnThoiCCywq2FtoQJtNJO3TyYGYilpBfceNCEbf+iDv/xkmbhbYeGDiccy/3zPFizqSyrG+jtLa+sblV3q7s7O7tH5iH1Z6MEkFol0Q8En0PS8pZSLuKKU77saA48Dh98KbXuf/wSIVkUXivZjF1AjwOmc8IVlpyzWrdbwwDrCYE8/Qmc+OzumvWrKY1B1oldkFqUKDjml/DUUSSgIaKcCzlwLZi5aRYKEY4zSrDRNIYkyke04GmIQ6odNJ59gydamWE/EjoFyo0V39vpDiQchZ4ejKPKZe9XPzPGyTKv3RSFsaJoiFZHPITjlSE8iLQiAlKFJ9pgolgOisiEywwUbquii7BXv7yKum1mvZ5075r1dpXRR1lOIYTaIANF9CGW+hAFwg8wTO8wpuRGS/Gu/GxGC0Zxc4R/IHx+QPLdpOb f AAAB6nicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstLAgsbHEKEgCF7K37MGGvb3L7pwJufATbCw0xtZfZOe/cYErFHzJJC/vzWRmXpBIYdB1v53C2vrG5lZxu7Szu7d/UD48aps41Yy3WCxj3Qmo4VIo3kKBkncSzWkUSP4YjG9m/uMT10bE6gEnCfcjOlQiFIyile6rYbVfrrg1dw6ySrycVCBHs1/+6g1ilkZcIZPUmK7nJuhnVKNgkk9LvdTwhLIxHfKupYpG3PjZ/NQpObPKgISxtqWQzNXfExmNjJlEge2MKI7MsjcT//O6KYZXfiZUkiJXbLEoTCXBmMz+JgOhOUM5sYQyLeythI2opgxtOiUbgrf88ipp12veRc27q1ca13kcRTiBUzgHDy6hAbfQhBYwGMIzvMKbI50X5935WLQWnHzmGP7A+fwBgSaNQQ== Fig. 1: p -le vel input confidence ellipsoid E p , its image f ( E p ) , and the estimated output confidence ellipsoid. region for the output. Of course, there is an infinite number of such possible confidence regions. Our goal is find the “best” confidence region with respect to some metric. Using the volume of the ellipsoid as an optimization criterion, the best confidence region amounts to solving the problem minimize V olume( S ) subject to f ( E p ) ⊆ S . (7) The solution to the above problem provides the p -level confi- dence region with the minimum volume. Figure 1 illustrates the procedure of confidence estimation. In the next section, we provide a con ve x relaxation of the optimization problem (7). The other problem in (5) is a straightforward extension of confidence estimation, and hence, we will not discuss the details. I I I . P R O BL E M R E L A X A T I O N V I A Q UA D R AT I C C O N S T R A I N T S Due to the presence of nonlinear activ ation functions, checking the condition f ( E p ) ⊆ S in (5) or (7) is a non- con vex feasibility problem and is NP-hard, in general. Our main idea is to abstract the original network f by another network ˜ f in the sense that ˜ f o ver -approximates the output of the original network for any input ellipsoid, i.e., f ( E p ) ⊆ ˜ f ( E p ) for any p ∈ [0 , 1) . Then it will be sufficient to verify the safety properties of the relaxed network, i.e., verify the inclusion ˜ f ( E p ) ⊆ S . In the following, we use the frame work of quadratic constraints to de velop such an abstraction. A. Relaxation of Nonlinearities by Quadratic Constraints In this subsection, we sho w how we can abstract activ a- tion functions, and in particular the ReLU function, using quadratic constraints. W e first provide a formal definition, introduced in [14]. Definition 2 Let φ : R d → R d be and suppose Q ⊂ S 2 d +1 is the set of all symmetric and indefinite matrices Q such that the inequality x φ ( x ) 1 > Q x φ ( x ) 1 ≥ 0 , (8) holds for all x ∈ R d . Then we say φ satisfies the quadratic constraint defined by Q . Note that the matrix Q in Definition 2 is indefinite, or otherwise, the constraint trivially holds. Before deriving QCs for the ReLU function, we recall some definitions, which can be found in many references; for example [24], [25]. Definition 3 (Sector-bounded nonlinearity) A nonlinear function ϕ : R → R is sector-bounded on the sector [ α, β ] ( 0 ≤ α ≤ β ) if the following condition holds for all x , ( ϕ ( x ) − αx )( ϕ ( x ) − β x ) ≤ 0 . (9) Definition 4 (Slope-restricted nonlinearity) A nonlinear function ϕ ( x ) : R → R is slope-restricted on [ α, β ] ( 0 ≤ α ≤ β ) if for any ( x, ϕ ( x )) and ( x ? , ϕ ( x ? )) , ( ϕ ( x ) − ϕ ( x ? ) − α ( x − x ? ))( ϕ ( x ) − ϕ ( x ? ) − β ( x − x ? )) ≤ 0 . (10) Repeated nonlinearities. Assuming that the same activ a- tion function is used in all neurons, we can exploit this structure to refine the QC abstraction of the nonlinearity . Explicitly , suppose ϕ : R → R is slope-restricted on [ α, β ] and let φ ( x ) = [ ϕ ( x 1 ) · · · ϕ ( x d )] > be a vector -valued function constructed by component-wise repetition of ϕ . It is not hard to verify that φ is also slope-restricted in the same sector . Howe ver , this representation simply ignores the fact that all the nonlinearities that compose φ are the same. By taking adv antage of this structure, we can refine the quadratic constraint that describes φ . T o be specific, for an input-output pair ( x, φ ( x )) , x ∈ R d , we can write the slope-restriction condition ( ϕ ( x i ) − ϕ ( x j ) − α ( x i − x j ))( ϕ ( x i ) − ϕ ( x j ) − β ( x i − x j )) ≤ 0 , (11) for all distinct i, j . This particular QC can tighten the relaxation incurred by the QC abstraction of the nonlinearity . There are sev eral results in the literature about repeated nonlinearities. For instance, in [25], [26], the authors deri ve QCs for repeated and odd nonlinearities (e.g. tanh function). B. QC for ReLU function In this subsection, we derive quadratic constraints for the ReLU function, φ ( x ) = max(0 , x ) , x ∈ R d . Note that this function lies on the boundary of the sector [0 , 1] . More precisely , we can describe the ReLU function by three quadratic and/or af fine constraints: y i = max(0 , x i ) ⇔ y i ≥ x i , y i ≥ 0 , y 2 i = x i y i . (12) On the other hand, for any two distinct indices i 6 = j , we can write the constraint (11) with α = 0 , and β = 1 , ( y j − y i ) 2 ≤ ( y j − y i )( x j − x i ) . (13) By adding a weighted combination of all these constraints (positiv e weights for inequalities), we find that the ReLU function y = max(0 , x ) satisfies d X i =1 λ i ( y 2 i − x i y i ) + ν i ( y i − x i ) + η i y i − (14) X i 6 = j λ ij ( y j − y i ) 2 − ( y j − y i )( x j − x i ) ≥ 0 , for any multipliers ( λ i , ν i , η i , λ ij ) ∈ R × R 3 + for i, j ∈ { 1 , · · · , d } . This inequality can be written in the compact form (8), as stated in the follo wing lemma. Lemma 4 (QC for ReLU function) The ReLU function, φ ( x ) = max(0 , x ) : R d → R d , satisfies the QC defined by Q wher e Q = Q | Q = 0 T − ν T − 2 T ν + η − ν > ν > + η > 0 . (15) Her e η , ν ≥ 0 and T ∈ S d + is given by T = d X i =1 λ i e i e > i + d − 1 X i =1 d X j >i λ ij ( e i − e j )( e i − e j ) > , wher e e i is the i -th basis vector in R d and λ ij ≥ 0 . Pr oof: See [14]. Lemma 4 characterizes a family of valid QCs for the ReLU function. It is not hard to verify that the set Q of valid QCs is a con ve x cone. As we will see in the next section, the matrix Q in (15) appears as a decision variable in the optimization problem. C. T ightening the Relaxation In the previous subsection, we derived QCs that are valid for the whole space R d . When restricted to a region R ⊆ R d , we can tighten the QC relaxation. Consider the relationship φ ( x ) = max(0 , x ) , x ∈ R ⊆ R d and let I + , and I − be the set of neurons that are always acti ve or always inacti ve, i.e., I + = { i | x i ≥ 0 for all x ∈ R} (16) I − = { i | x i < 0 for all x ∈ R} . The constraint y i ≥ x i holds with equality for acti ve neurons. Therefore, we can write ν i ∈ R if i ∈ I + , ν i ≥ 0 otherwise . Similarly , the constraint y i ≥ 0 holds with equality for inactiv e neurons. Therefore, we can write η i ∈ R if i ∈ I − , η i ≥ 0 otherwise . Finally , it can be verified that the cross-coupling constraint in (13) holds with equality for pairs of always active or always inactiv e neurons. Therefore, for any 1 ≤ i < j ≤ d , we can write λ ij ∈ R if ( i, j ) ∈ I + × I + or ( i, j ) ∈ I − × I − λ ij ≥ 0 otherwise . These additional degrees of freedom on the multipliers can tighten the relaxation incurred in (14). Note that the set of activ e or inactive neurons are not kno wn a priori . Ho wever , we can partially find them using, for example, interval arithmetic. I V . A N A L Y S I S O F T H E R E L A X E D N E T W O R K V I A S E M I D E FI N I T E P R O G R A M M I N G In this section, we use the QC abstraction developed in the previous section to analyze the safety of the relaxed network. In the next theorem, we state our main result for one-layer neural networks and will discuss the multi-layer case in Section IV -A. Theorem 1 (Output covering ellipsoid) Consider a one- layer neural network f : R n x → R n y described by the equation y = W 1 φ ( W 0 x + b 0 ) + b 1 , (17) wher e φ : R n 1 → R n 1 satisfies the quadratic constraint defined by Q , i.e., for any Q ∈ Q , z φ ( z ) 1 > Q z φ ( z ) 1 ≥ 0 for all z . (18) Suppose x ∈ E ( µ x , Σ x ) . Consider the following matrix inequality M 1 + M 2 + M 3 0 , (19) wher e M 1 = I n x 0 0 0 0 1 P ( τ ) I n x 0 0 0 0 1 > M 2 = W 0 > 0 0 0 I n 1 0 b 0 > 0 1 Q W 0 0 b 0 0 I n 1 0 0 0 1 M 3 = 0 0 W 1 > 0 b 1 > 1 S ( A, b ) 0 W 1 b 1 0 0 1 with P ( τ ) = τ − Σ − 1 x Σ − 1 x µ x µ > x Σ − 1 x − µ > x Σ − 1 x µ x + 1 S ( A, b ) = A 2 Ab b > A b > b − 1 . If (19) is feasible for some ( τ , A, Q, b ) ∈ R + × S n y × Q × R n y , then y ∈ E ( µ y , Σ y ) with µ y = − A − 1 b and Σ y = A − 2 . Pr oof: W e first introduce the auxiliary variable z , and rewrite the equation of the neural network as z = φ ( W 0 x + b 0 ) y = W 1 z + b 1 . Since φ satisfies the QC defined by Q , we can write the following QC from the identity z = φ ( W 0 x + b 0 ) : W 0 x + b 0 z 1 > Q W 0 x + b 0 z 1 ≥ 0 , for all Q ∈ Q . (21) By substituting the identity W 0 x + b 0 z 1 = W 0 0 b 0 0 I n 1 0 0 0 1 x z 1 , back into (21) and denoting x = [ x > z > ] > , we can write the inequality x 1 > M 2 x 1 ≥ 0 , (22) for any Q ∈ Q and all x . By definition, for all x ∈ E ( µ x , Σ x ) , we hav e ( x − µ x ) > Σ − 1 x ( x − µ x ) ≤ 1 , which is equi valent to τ x 1 > − Σ − 1 x Σ − 1 x µ x µ > x Σ − 1 x − µ > x Σ − 1 x µ x + 1 x 1 ≥ 0 . By using the identity x 1 = I n x 0 0 0 0 1 x z 1 , we conclude that for all x ∈ E ( µ x , Σ x ) , z = φ ( W 0 x + b ) , x 1 > M 1 x 1 ≥ 0 . (23) Suppose (19) holds for some ( A, Q, b ) ∈ S n y × Q × R n y . By left- and right- multiplying both sides of (18) by [ x > 1] and [ x > 1] > , respecti vely , we obtain x 1 > M 1 x 1 + x 1 > M 2 x 1 + x 1 > M 3 x 1 ≤ 0 . For an y x ∈ E ( µ x , Σ x ) the first two quadratic terms are nonnegati ve by (23) and (22), respectiv ely . Therefore, the last term on the left-hand side must be nonpositi ve for all x ∈ E ( µ x , Σ x ) , x 1 > M 3 x 1 ≤ 0 . But the preceding inequality , using the relation y = W 1 z + b 1 , is equi valent to y 1 > A 2 Ab b > A b > b − 1 y 1 ≤ 0 , which is equivalent to ( y + A − 1 b ) > A 2 ( y + A − 1 b ) ≤ 1 . Using our notation for ellipsoids, this means for all x ∈ E ( µ x , Σ x ) , we must hav e y ∈ E ( − A − 1 b, A − 2 ) . In Theorem 1, we proposed a matrix inequality , in variables ( Q, A, b ) , as a sufficient condition for enclos- ing the output of the neural network with the ellipsoid E ( − A − 1 b, A − 2 ) . W e can now use this result to find the minimum-volume ellipsoid with this property . Note that the matrix inequality (19) is not linear in ( A, b ) . Ne vertheless, we can con vexify it by using Schur Complements. Lemma 5 The matrix inequality in (19) is equivalent to the linear matrix inequality (LMI) M = M 1 + M 2 − ee > 0 n x × n y W 1 > A b 1 > A + b > 0 n y × n x AW 1 Ab 1 + b − I n y 0 , (24) in ( τ , A, Q, b ) , where e = (0 , · · · , 0 , 1) ∈ R n x + n 1 +1 . Pr oof: It is not hard to verify that M 3 can be written as M 3 = F F > − ee > , where F , affine in ( A, b ) , is given by F ( A, b ) = 0 n x × n y W 1 > A b 1 > A + b > . Using this definition, the matrix inequality in (19) reads ( M 1 + M 2 − ee > ) + F F > 0 , which implies that the term in the parentheses must be non-negati ve, i.e., M 1 + M 2 − ee > 0 . Using Schur Complements, the last two inequalities are equiv alent to (24). Having established Lemma 5, we can no w find the minimum-volume cov ering ellipsoid by solving the follo wing semidefinite program (SDP), minimize − log det( A ) subject to (24) . (25) where the decision variables are ( τ , A, Q, b ) ∈ R + × S n y × Q × R n y . Since Q is a conv ex cone, (25) is a conv ex program and can be solved via interior-point method solvers. A. Multi-layer Case For multi-layer neural networks, we can apply the result of Theorem 1 in a layer-by-layer fashion provided that the input confidence ellipsoid of each layer is non-degenerate. This assumption holds when for all 0 ≤ k ≤ ` − 1 we hav e n k +1 ≤ n k (reduction in the width of layers), and the weight matrices W k ∈ R n k +1 × n k are full rank. T o see this, we note that ellipsoids are inv ariant under affine subspace transformations such that W k E ( µ k , Σ k ) + b k = E ( W k µ k + b k , W k Σ k W k > ) . This implies that Σ k +1 := W k Σ k W k > is positiv e definite whenev er Σ k is positi ve definite, implying that the ellipsoid E ( µ k +1 , Σ k +1 ) is non-de generate. If the assumption n k +1 ≤ n k is violated, we can use the compact representation of multi-layer neural networks elaborated in [14] to arri ve at the multi-layer couterpart of the matrix inequality in (19). V . N U M E R I C A L E X P E R I M E N T S In this section, we consider a numerical experiment, in which we estimate the confidence ellipsoid of a one-layer neural network with n x = 2 inputs, n 1 ∈ { 10 , 30 , 50 } hidden neurons and n y = 2 outputs. W e assume the input is Gaussian with µ x = (1 , 1) and Σ x = diag (1 , 2) . The weights and biases of the network are chosen randomly . W e use MA TLAB, CVX [27], and Mosek [28] to solve the -10 -5 0 -10 -5 0 5 10 n 1 =10 -10 -5 0 -5 0 5 10 n 1 =10 -10 0 10 -25 -20 -15 -10 -5 0 5 n 1 =30 -10 0 10 -25 -20 -15 -10 -5 0 5 n 1 =30 0 5 10 -15 -10 -5 0 n 1 =50 0 5 10 -15 -10 -5 n 1 =50 Fig. 2: T op: the estimated 95% confidence ellipsoid along with 10 4 samples of the output. Bottom: The image of the 95% input confi- dence ellipsoid ( f ( E p ) with p = 0 . 95 ) and its outer approximation (the output confidence ellipsoid). corresponding SDP . In Figure 2, we plot the estimated 0 . 95 - lev el output confidence ellipsoid along with 10 4 sample out- puts. W e also plot the image of 0 . 95 -lev el input confidence ellipsoid under f along with the estimated 0 . 95 -lev el output confidence ellipsoid. V I . C O N C L U S I O N S W e studied probabilistic safety verification of neural net- works when their inputs are subject to random noise with known first two moments. Instead of analyzing the network directly , we proposed to study the safety of an abstracted network instead, in which the nonlinear activ ation functions are relaxed by the quadratic constraints their input-output pairs satisfy . W e then showed that we can analyze the safety properties of the abstracted network using semidefinite pro- gramming. It would be interesting to consider other related problems such as closed-loop statistical safety verification and reachability analysis. R E F E R E N C E S [1] M. Bojarski, D. Del T esta, D. Dworakowski, B. Firner , B. Flepp, P . Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, et al. , “End to end learning for self-driving cars, ” arXiv preprint , 2016. [2] G. Shi, X. Shi, M. O’Connell, R. Y u, K. Azizzadenesheli, A. Anand- kumar , Y . Y ue, and S.-J. Chung, “Neural lander: Stable drone landing control using learned dynamics, ” arXiv preprint , 2018. [3] S. Zheng, Y . Song, T . Leung, and I. Goodfellow , “Improving the ro- bustness of deep neural networks via stability training, ” in Proceedings of the ieee conference on computer vision and pattern recognition , pp. 4480–4488, 2016. [4] S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P . Frossard, “Uni- versal adversarial perturbations, ” arXiv pr eprint , 2017. [5] J. Su, D. V . V argas, and K. Sakurai, “One pixel attack for fooling deep neural networks, ” IEEE T ransactions on Evolutionary Computation , 2019. [6] O. Bastani, Y . Ioannou, L. Lampropoulos, D. Vytiniotis, A. Nori, and A. Criminisi, “Measuring neural net robustness with constraints, ” in Advances in neural information pr ocessing systems , pp. 2613–2621, 2016. [7] A. Lomuscio and L. Maganti, “ An approach to reachability analysis for feed-forward relu neural networks, ” arXiv preprint , 2017. [8] V . Tjeng, K. Xiao, and R. T edrake, “Evaluating robustness of neural networks with mixed integer programming, ” arXiv preprint arXiv:1711.07356 , 2017. [9] J. Z. Kolter and E. W ong, “Provable defenses against adversarial examples via the con vex outer adversarial polytope, ” arXiv pr eprint arXiv:1711.00851 , vol. 1, no. 2, p. 3, 2017. [10] K. Dvijotham, R. Stanforth, S. Go wal, T . Mann, and P . Kohli, “ A dual approach to scalable verification of deep networks, ” arXiv pr eprint arXiv:1803.06567 , 2018. [11] L. Pulina and A. T acchella, “Challenging smt solvers to verify neural networks, ” AI Communications , vol. 25, no. 2, pp. 117–135, 2012. [12] R. Ivanov , J. W eimer, R. Alur, G. J. Pappas, and I. Lee, “V erisig: verifying safety properties of hybrid systems with neural network controllers, ” arXiv preprint , 2018. [13] W . Xiang, H.-D. Tran, and T . T . Johnson, “Output reachable set estimation and verification for multilayer neural networks, ” IEEE transactions on neural networks and learning systems , no. 99, pp. 1–7, 2018. [14] M. Fazlyab, M. Morari, and G. J. Pappas, “Safety verification and robustness analysis of neural networks via quadratic constraints and semidefinite programming, ” arXiv preprint , 2019. [15] M. Mirman, T . Gehr , and M. V echev , “Differentiable abstract in- terpretation for prov ably robust neural networks, ” in International Confer ence on Machine Learning , pp. 3575–3583, 2018. [16] M. Hein and M. Andriushchenko, “Formal guarantees on the robust- ness of a classifier against adversarial manipulation, ” in Advances in Neural Information Pr ocessing Systems , pp. 2266–2276, 2017. [17] S. W ang, K. Pei, J. Whitehouse, J. Y ang, and S. Jana, “Efficient formal safety analysis of neural networks, ” in Advances in Neural Information Pr ocessing Systems , pp. 6369–6379, 2018. [18] T .-W . W eng, P .-Y . Chen, L. M. Nguyen, M. S. Squillante, I. Oseledets, and L. Daniel, “Pro ven: Certifying robustness of neural networks with a probabilistic approach, ” arXiv pr eprint arXiv:1812.08329 , 2018. [19] K. Dvijotham, M. Garnelo, A. F awzi, and P . K ohli, “V erification of deep probabilistic models, ” arXiv pr eprint arXiv:1812.02795 , 2018. [20] A. Bibi, M. Alfadly , and B. Ghanem, “ Analytic expressions for probabilistic moments of pl-dnn with gaussian input, ” in Proceedings of the IEEE Conference on Computer V ision and P attern Recognition , pp. 9099–9107, 2018. [21] K. P . W abersich and M. N. Zeilinger, “Linear model predictive safety certification for learning-based control, ” in 2018 IEEE Confer ence on Decision and Control (CDC) , pp. 7130–7135, IEEE, 2018. [22] D. V an Hessem and O. Bosgra, “ A conic reformulation of model predictiv e control including bounded and stochastic disturbances under state and input constraints, ” in Pr oceedings of the 41st IEEE Confer- ence on Decision and Control, 2002. , vol. 4, pp. 4643–4648, IEEE, 2002. [23] M. Cannon, B. Kouv aritakis, S. V . Rakovic, and Q. Cheng, “Stochastic tubes in model predictive control with probabilistic constraints, ” IEEE T ransactions on Automatic Control , vol. 56, no. 1, pp. 194–200, 2011. [24] A. Megretski and A. Rantzer, “System analysis via integral quadratic constraints, ” IEEE T ransactions on Automatic Control , vol. 42, no. 6, pp. 819–830, 1997. [25] F . D’amato, M. A. Rotea, A. Megretski, and U. J ¨ onsson, “New results for analysis of systems with repeated nonlinearities, ” Automatica , vol. 37, no. 5, pp. 739–747, 2001. [26] V . V . Kulkarni and M. G. Safonov , “ All multipliers for repeated monotone nonlinearities, ” IEEE T ransactions on A utomatic Contr ol , vol. 47, no. 7, pp. 1209–1212, 2002. [27] M. Grant, S. Boyd, and Y . Y e, “Cvx: Matlab software for disciplined con vex programming, ” 2008. [28] M. ApS, The MOSEK optimization toolbox for MA TLAB manual. V ersion 8.1. , 2017.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment