Adversarial Robustness Assessment: Why both $L_0$ and $L_infty$ Attacks Are Necessary

1 Adv ersarial Rob ustness Assessment: Why both L 0 and L ∞ Attacks Are Necessary Shashank K otyan and Danilo V asconcellos V argas Abstract —There exists a v ast number of adversarial attacks and defences for machine learning algorithms of various types which mak es assessing the rob ustness of algorithms a daunting task. T o make matters w orse, there is an intrinsic bias in these adversarial algorithms. Here, we organise the problems faced: a) Model Dependence, b) Insufﬁcient Evaluation, c) F alse Adversarial Samples, and d) Perturbation Dependent Results). Based on this, we propose a model agnostic dual quality assessment method, together with the concept of rob ustness levels to tackle them. W e validate the dual quality assessment on state-of-the-art neural networks (WideResNet, ResNet, AllCon v , DenseNet, NIN, LeNet and CapsNet) as well as adversarial defences for image classiﬁcation problem. W e further show that current netw orks and defences are vulnerable at all levels of rob ustness. The proposed rob ustness assessment re veals that depending on the metric used (i.e., L 0 or L ∞ ), the rob ustness may vary signiﬁcantly . Hence, the duality should be taken into account for a correct evaluation. Moreo ver , a mathematical deriv ation, as well as a counter -example, suggest that L 1 and L 2 metrics alone are not sufﬁcient to av oid spurious adversarial samples. Interestingly , the threshold attack of the proposed assessment is a novel L ∞ black-box adversarial method which r equires ev en less perturbation than the One-Pixel Attack (only 12% of One-Pixel Attack’ s amount of perturbation) to achieve similar results. Index T erms —Deep Learning, Neural Networks, Adversarial Attacks, Few-Pixel Attack, Threshold Attack I . I N TR O D U C T I O N N EURAL networks have empo wered us to obtain high accuracy in sev eral applications like speech recognition and face recognition. Most of these applications are only feasible by the aid of neural networks. Despite these accomplishments, neural networks hav e been shown to misclassify if small perturbations are added to original samples, called adversarial samples. Further , these adversarial samples exhibit that conv entional neural network architectures are not capable of understanding concepts or high-lev el abstractions as we earlier speculated. Security and safety risks created by these adversarial samples is also prohibiting the use of neural networks in many critical applications such as autonomous vehicles. Therefore, it is of utmost signiﬁcance to formulate not only accurate but robust neural networks. Howe ver , to do so, a quality assessment is required, which would let robustness to be ev aluated efﬁciently without in-depth knowledge of adversarial machine learning. S. K otyan and D.V . V argas are with the Laboratory of Intelligent Systems, Department of Informatics, Kyushu University , Japan. http://lis.inf.kyushu- u. ac.jp/. E-mail:vargas@inf.k yushu- u.ac.jp Regarding the dev elopment of a quality assessment for robustness, the ﬁeld of adversarial machine learning has provided some tools which could be useful for the dev elopment. Howe ver , the sheer amount of scenarios, adversarial attacking methods, defences and metrics ( L 0 , L 1 , L 2 and L ∞ ) make the current state-of-the-art difﬁcult to perceiv e. Moreover , most of the contemporary adversarial attacks are white-box ones which can not be used to assess hybrids, non-standard neural networks and other classiﬁers in general. Giving the vast amount of possibilities and many deﬁnitions with their e xceptions and trade-offs, it turns out that a simple robustness quality assessment is a daunting task. Moreov er , adversarial samples point out to reasoning shortcomings in machine learning. Impro vements in robustness should also result in learning systems that can better reason o ver data as well as achiev e a new le vel of abstraction. Therefore, a quality assessment procedure would also be helpful in this re gard, checking for failures in both reasoning and high-lev el abstractions. Therefore, to create a quality assessment procedure, we formalise some of the problems which must be tackled: P1 Model Dependence: A model agnostic quality assessment is crucial to enable neural networks to be compared with other approaches which may be completely different (logic hybrids and ev olutionary hybrids). P2 Insufﬁcient Evaluation: There are sev eral types of adversarial samples as well as potential attack variations and scenarios each with their own bias. The attacks also differ substantially depending on metrics optimized, namely L 0 , L 1 , L 2 and L ∞ . Howe ver , not all of them are vital for the ev aluation of robustness. A quality assessment should hav e few but sufﬁcient tests to provide an in-depth analysis without compromising its utility . P3 False Adversarial Samples: Adv ersarial attacks are known sometimes to produce misleading adversarial samples (samples that can not be recognised even by a human observer) seldomly . Such deceptiv e adversarial samples can only be detected through inspection, which causes the ev aluation to be error -prone. Both the need for inspection, together with the feasibility of fraudulent adversarial samples, should not be present. P4 Perturbation Dependent Results: V arying amount of perturbation leads to v arying adversarial accuracy . Moreov er , networks differ in their sensitivity to attacks giv en a v aried amount of perturbation. Consequently , this might result in double standards or hide important 2 Few-Pixel ( L 0 ) Attack Threshold ( L ∞ ) Attack Fig. 1: Adversarial samples found with Few-pixel ( L 0 ) black-box attack and threshold ( L ∞ ) black-box attack. information. In this article, we propose a quality assessment to tackle the problems mentioned above with the following features: Non-gradient based Black-box Attack (Address P1): Black-box attacks are desirable for a model agnostic ev aluation which does not depend on speciﬁc features of the learning process such as gradients. Therefore, here, the proposed quality assessment is based on black-box attacks, one of which is a novel L ∞ black-box attack. In fact, to the knowledge of the authors, this is the ﬁrst L ∞ black-box Attack that does not make any assumptions over the target machine learning system. Figure 1 show some adversarial samples crafted with the L 0 and L ∞ black-box Attacks used in the quality assessment. Dual Evaluation (Address P2 and P3): W e propose to use solely attacks based on L 0 and L ∞ to av oid creating adversarial samples which are not correctly classiﬁed by human beings after modiﬁcation. These metrics impose a constraint over the spatial distribution of noise which guarantees the quality of the adversarial sample. In Section 3 IV, this is explained mathematically as well as illustrated with a counter-e xample. Robustness Levels (Address P4): In this article, we deﬁne robustness levels in terms of the constraint’ s threshold th . W e then compare multiple robustness levels of results with their respectiv e values at the same robustness le vel. Robustness le vels constrain the comparison of equal perturbation, avoiding the comparison of results with different degrees of perturbation (Problem P4). In fact, robustness lev els add a concept which may aid in the classiﬁcation of algorithms. For example, an algorithm which is robust to One-Pixel Attack belongs to the 1-pixel-safe category . I I . R E L A T E D W O R K S Recently , it was exhibited that neural networks contain many vulnerabilities. The ﬁrst article on the topic dates back to 2013 when it was re vealed that neural netw orks behave oddly for almost the same images [1]. Afterw ards, a series of vulnerabilities were found and exploited by the use of adversarial attacks. In [2], the authors demonstrated that neural networks show high conﬁdence when presented with textures and random noise. Adversarial perturbations which can be added to most of the samples to fool a neural network was shown to be possible [3]. Patches can also make them misclassify , and the addition of them in an image turn it into a different class [4]. Moreov er , an extreme attack was sho wn to be effecti ve in which it is possible to make neural networks misclassify with a single-pixel change [5]. Many of these attacks can be easily made into real-world threats by printing out adversarial samples, as sho wn in [6]. Moreov er , carefully crafted glasses can also be made into attacks [7]. Alternativ ely , ev en general 3D adversarial objects were shown possible [8]. Regarding understanding the phenomenon, it is argued in [9] that neural networks’ linearity is one of the main reasons. Another recent inv estigation proposes the conﬂicting saliency added by adversarial samples as the reason for misclassiﬁcation [10]. Many defensiv e systems and detection systems hav e also been proposed to mitigate some of the problems. Ho wev er , there are still no current solutions or promising ones which can negate the adversarial attacks consistently . Regarding defensiv e systems, defensive distillation in which a smaller neural network squeezes the content learned by the original one was proposed as a defence [11]. Howe ver , it was shown not to be robust enough in [12]. Adversarial training was also proposed, in which adversarial samples are used to augment the training dataset [9], [13], [14]. Augmentation of the dataset is done in such a way that the neural network should be able to classify the adversarial samples, increasing its robustness. Although adversarial training can increase the robustness slightly , the resulting neural network is still vulnerable to attacks [15]. There are many recent variations of defenses [16], [17], [18], [19], [20], [21], [22], [23] which are carefully analysed and man y of their shortcomings are explained in [24], [25]. Regarding detection systems, a study from [26] demonstrated that indeed some adversarial samples have different statistical properties which could be exploited for detection. In [21], the authors proposed to compare the prediction of a classiﬁer with the prediction of the same input but ”squeezed”. This technique allowed classiﬁers to detect adversarial samples with small perturbations. Many detection systems fail when adversarial samples deviate from test conditions [27], [28]. Thus, the clear beneﬁts of detection systems remain inconclusi ve. I I I . A DV E R S A R I A L M AC H I N E L E A R N I N G A S O P T I M I S A T I O N P RO B L E M Adversarial machine learning can be perceived as a constrained optimisation problem. Before deﬁning it, let us formalise adversarial samples ﬁrst. Let f ( x ) ∈ [ [0 , 1] ] be the output of a machine learning algorithm in binary classiﬁcation setting. Extrapolating the algorithm in multi-label classiﬁcation setting, the output can be deﬁned as f ( x ) ∈ [ [1 ..N ] ] . Here, x ∈ R k is the input of the algorithm for the input of size k and N is the number of classes in which x can be classiﬁed. An adversarial sample x 0 for an original sample x can be thus, deﬁned as follows: x 0 = x +  x such that f ( x 0 ) 6 = f ( x ) in which  x ∈ R k is a small perturbation added to the input. Therefore, adversarial machine learning can be deﬁned as an optimization problem 1 : minimize  x g ( x +  x ) c subject to k  x k ≤ th where th is a pre-deﬁned threshold value and g () c is the soft- label or the conﬁdence for the correct class c such that f ( x ) = argmax g ( x ) . The constraint in the optimisation problem has the objectiv e of disallowing perturbations which could make x unrecognisable or change its correct class. Therefore, the constraint is itself a mathematical deﬁnition of what constitutes an imperceptible perturbation. Many different norms are used in the literature (e.g., L 0 , L 1 , L 2 and L ∞ ). Intuitiv ely , the norms allow for different types of attacks. For simplicity , we are narrowing the scope of this article to the image classiﬁcation problem alone. Howe ver , the proposed attacks and the quality assessment can be also be extended to other problems as well. I V . G U A RA N T E E I N G T H E Q UA L I T Y O F A DV E R S A R I A L S A M P L E S Constraining the perturbation is decisive in adversarial samples to av oid producing samples that can not be recognised by human beings or samples that hav e, by the amount of perturbation, changed its correct class. Ho we ver , restraining the total amount of perturbation is not enough as a small amount of perturbation concentrated in a few pixels might be able to create false adversarial samples. Therefore, 1 Here the deﬁnition will only concern untargeted attacks in classiﬁcation setting but a similar optimization problem can be deﬁned for targeted attacks 4 Fig. 2: Example of a false adversarial sample (right) and its respecti ve original sample (left). The false adversarial sample is built with fe w total perturbations (i.e., low L 1 and L 2 ) but with unrecognisable ﬁnal image (false adversarial sample). This is a result of the non- constrained spatial distribution of perturbations which is prevented if low L 0 or L ∞ is used. This hypothetical attack has a L 2 of merely 356 , well below the maximum L 2 for the One-Pixel ( L 0 ≤ 1 ) Attack ( 765 ). a spatial constraint over the perturbation of pixels P would be a desirable feature. This can be achiev ed mathematically as follows: Giv en an image x and its perturbed counterpart x 0 , it is possible to calculate L 1 norm between the original by the Manhattan distance of both matrices: k x − x 0 k 1 . Constraining L 1 to be less than a certain number does not guarantee any spatial distribution constraint. Let us deﬁne a set based on all non-zero pixel perturbations as follo ws: N z = { P i : k P i − P 0 i k 1 > 0 } where P i and P 0 i are pixels from respectiv ely the original image x and the perturbed image x 0 and i is an image index. Both N z and its cardinality | N z | has information about the spatial distribution of perturbations and constraining any of these values would result in a spatially limited perturbation. Provided that th is lo w enough, a modiﬁcation preserving the white noise of that intensity would bound | N z | < th . Moreov er , | N z | is precise L 0 , demonstrating that L 0 is based on the set N z , which stores spatial information about the differences. At the same time, L 1 uses the Manhattan norm, which does not have this information. Similarly , the L ∞ norm can be rewritten as the following optimisation constraint: ∀ P i ∈ x, k P i − P 0 i k ∞ ≤ th Notice that this constraint is also deﬁned over the spatial distribution of perturbations. Figure 2 gives empirical evidence of misleading adversarial sample of an image that is constrained by L 2 ≤ 765 . Notice that this value is precisely the maximum change of one pixel, i.e., the maximum possible perturbation of the One-Pixel attack ( L 0 ≤ 1 ) which when no limits are imposed over its spatial distrib ution may create false adversarial samples. The reasoning behind the L 0 and L ∞ are as follo ws, without altering much the original sample, attacks can perturb a few pixels strongly ( L 0 ), all pixels slightly ( L ∞ ) or a mix of both ( L 1 and L 2 ). The hurdle is that L 1 and L 2 which mix both strategies vary strongly with the size of images, if not used with caution may cause unrecognisable adv ersarial samples (Problem P3). Also, it is difﬁcult to compare between methods using L 1 and L 2 norm because the amount of perturbations will often differ (Problem P4). Threshold Attack ( L ∞ black-box Attack): The threshold attack optimizes the constrained optimization problem with the constraint k  x k ∞ ≤ th , i.e., it uses the L ∞ norm. The algorithm search in R k space as the search space is the same as the input space. This is because the variables can be an y variation of the input as long as the threshold is respected. In image classiﬁcation problem k = m × n × c where m × n is the size, and c is the number of channels of the image. Few-Pixel Attack ( L 0 black-box Attack): The few-pix el attack is a v ariation of our pre vious proposed attack, the One-Pixel Attack [5]. It optimizes the constrained optimization problem by using the constraint k  x k 0 ≤ th , i.e., it uses the L 0 norm. The search variable is a combination of pixel values (depending on channels c in the image) and position ( 2 values X, Y) for all of the pixels ( th pixels). Therefore, the search space is smaller than the threshold attack deﬁned belo w with dimensions of R (2+ c ) × th . Robustness Levels: Here we propose robustness lev els, as machine learning algorithms might perform differently to varying amount of perturbations. Robustness levels ev aluate classiﬁers in a couple of th thresholds. Explicitly , we deﬁne four levels of robustness 1 , 3 , 5 , 10 for both of our L 0 Norm Attack and L ∞ Norm Attack. W e then name them respectiv ely pixel and threshold robustness lev els. Algorithms that pass a level of robustness ( 0% adversarial accuracy) are called lev el-threshold-safe or lev el-pixel-safe. For example, an algorithm that passes the le vel-one in threshold ( L ∞ ) attack is called 1-threshold-safe. V . E X P E R I M E N TA L R E S U LT S A N D D I S C U S S I O N S In this section, we aim to validate the dual quality assessment 2 empirically as well as analyse the current state-of-the-art neural networks in terms of robustness. Preliminary T ests (Section V -B): T ests on two state-of-the-art neural networks are presented (ResNet [29] and CapsNet [30]). These tests are done to choose the black-box optimisation algorithm to be used for the further sections. The performance of both Differential Evolution (DE) [31] and Co variance Matrix Adaptation Evolution Strategy (CMA-ES) [32] are ev aluated. Evaluating Learning and Defense Systems (Section V -C): T ests are extended to the sev en different state-of-the-art neural networks - W ideResNet [33], DenseNet [34], ResNet [29], Network in Network (NIN) [35], All Con volutional Network (AllConv) [36], CapsNet [30], and LeNet [37]. W e also ev aluate three adversarial defences applied to the standard ResNet architecture - Adversarial training (A T) [14], T otal V ariance Minimization (TVM) [19], and Feature Squeezing (FS) [21]. W e hav e chosen defences based on entirely dif ferent principles to be tested. In this way , the results achieved here can be extended to other similar types of defences in the literature. 2 Code is a vailable at http://bit.ly/DualQualityAssessment 5 Attack Parameters FGM norm = L ∞ ,  = 8 ,  step = 2 BIM norm = L ∞ ,  = 8 ,  step = 2 , iterations = 10 PGD norm = L ∞ ,  = 8 ,  step = 2 , iterations = 20 DeepFool iterations = 100 ,  = 0 . 000001 NewtonF ool iterations = 100 , eta = 0 . 01 L 0 Attack Common Parameter Size = 5 , DE NP = 400 , Number of Generations = 100 , CR = 1 CMA-ES Function Evaluations = 40000 , σ = 31 . 75 L ∞ Attack Common Parameter Size = 3072 , DE NP = 3072 , Number of Generations = 100 , CR = 1 CMA-ES Function Evaluations = 39200 , σ = th/ 4 T ABLE I: Description of various parameters of different adversarial attacks. Evaluating Other Adversarial Attacks (Section V -D): The ev aluated learning systems are tested against other existing white-box and black-box adversarial attacks such as - Fast Gradient Method (FGM) [9], Basic Iterativ e Method (BIM) [6], Projected Gradient Descent Method (PGD) [14], DeepFool [38], and NewtonF ool [39]. This analysis further helps to demonstrate the necessity of duality in quality assessment. Extremely Fast Quality Assessment (Section V -F): In this section, we apply and ev aluate the principle of transferability of adversarial samples. W e verify the possibility of a speedy version of the proposed quality assessment. W e implement this by using already crafted adversarial samples to fool neural networks, instead of a full-ﬂedged optimisation. This would enable attacks to hav e a O (1) time complexity , being signiﬁcantly faster . Quality Assessment’ s Attack Distribution (Section V -G): Here, we assess the dual-attack distribution (Few-Pix el Attack and Threshold Attack). The analysis of the distribution demonstrates the necessity of such duality . The distribution of successful attacks are sho wn, and pre vious attacks are analysed in this perspectiv e. Effect of threshold (Section V -H): W e analyse the complete behaviour of the adversarial accuracy of our black-box attacks without restricting the threshold’ s th value. Using this analysis, we prove the results using a ﬁxed th in robustness lev els is a reasonable approximation for our proposed quality assessment. A. Experimental Settings W e use CIF AR-10 dataset [40] to ev aluate our dual quality assessment. T able I gi ves the parameter description of various adversarial attacks used. All the pre-existing adversarial attacks used in the article have been ev aluated using Adversarial Robustness 360 T oolbox (AR T v1.2.0) [41]. For our L 0 and L ∞ Attacks, we use the canonical versions of the DE and CMA-ES algorithms to have a clear standard. DE uses a repair method in which v alues that go beyond range are set to random points within the valid range. While in CMA-ES, to satisfy the constraints, a simple repair method is employed in which pixels that surpass the minimum/maximum are brought back to the minimum/maximum value. Moreover , a clipping function is Model Attack Adversarial Accuracy Optimiser th = 1 th = 3 th = 5 th = 10 Few-Pixel ( L 0 ) Attack ResNet DE 24% 70% 75% 79% CMA-ES 12% 52% 73% 85% CapsNet DE 21% 37% 49% 57% CMA-ES 20% 39% 40% 41% Threshold ( L ∞ ) Attack ResNet DE 5% 23% 53% 82% CMA-ES 33% 71% 76% 83% CapsNet DE 11% 13% 15% 23% CMA-ES 13% 34% 72% 97% T ABLE II: Adversarial accuracy results for Few-Pix el ( L 0 ) and Threshold ( L ∞ ) Attacks with DE and CMA-ES used to keep values inside the feasible region. The constraint is always satisﬁed because the number of parameters is itself modelled after the constraint. In other words, when searching for one pixel perturbation, the number of variables are ﬁxed to pix el v alues (three values) plus position v alues (two values). Therefore it will always modify only one pixel, respecting the constraint. Since the optimisation is done in real values, to force the v alues to be within range, a simple clipping function is used for pix el values. For position values, a modulo operation is executed. B. Preliminary T ests: Choosing the Optimization Algorithm T able II shows the adversarial accuracy results performed ov er 100 random samples. Here adversarial accuracy corresponds to the accuracy of the adversarial attack to create adversarial samples to fool neural networks. Both black-box attacks can craft adversarial samples in all lev els of rob ustness. This fact demonstrates that without knowing anything about the learning system and in a constrained setting, black-box attacks are still able to reach more than 80% adversarial accuracy in state-of-the-art neural networks. Concerning the comparison of CMA-ES and DE, the outcomes fav our the choice of CMA-ES for the quality assessment. Both CMA-ES and DE perform likewise for the Few-Fix el Attack, with both DE and CMA-ES having the same number of wins. Howe v er , for the Threshold Attack, the performance varies signiﬁcantly . CMA-ES this time always wins (eight wins) against DE (no win). This domination of CMA-ES is expected since the Threshold Attack has a high dimensional search space which is more suitable for CMA-ES. This happens in part because DE’ s operators may allow some variables to con ver ge prematurely . CMA-ES, on the other hand, is always generating slightly different solutions while e volving a distribution. In these preliminary tests, CapsNet was sho wn ov erall superior to ResNet. Few-pixel ( L 0 ) Attack reach 85% adversarial accuracy for ResNet when ten pixels are modiﬁed. CapsNet, on the other hand, is more robust to Few-Pix el Attacks, allowing them to reach only 52% and 41% adv ersarial accuracy when ten pixels are modiﬁed for DE and CMA-ES respecti vely . CapsNet is less robust than ResNet to the Threshold Attack with th = 10 in which 6 Model and Adversarial Accuracy Standard Accuracy th = 1 th = 3 th = 5 th = 10 Few-Pixel ( L 0 ) Attack W ideResNet 95.12% 11% 55% 75% 94% DenseNet 94.54% 9% 43% 66% 78% ResNet 92.67% 12% 52% 73% 85% NIN 90.87% 18% 62% 81% 90% AllCon v 88.46% 11% 31% 57% 77% CapsNet 79.03% 21% 37% 49% 57% LeNet 73.57% 58% 86% 94% 99% A T 87.11% 22% 52% 66% 86% TVM 47.55% 16% 12% 20% 24% FS 92.37% 17% 49% 69% 78% Threshold ( L ∞ ) Attack W ideResNet 95.12% 15% 97% 98% 100% DenseNet 94.54% 23% 68% 72% 74% ResNet 92.67% 33% 71% 76% 83% NIN 90.87% 11% 86% 88% 92% AllCon v 88.46% 9% 70% 73% 75% CapsNet 79.03% 13% 34% 72% 97% LeNet 73.57% 44% 96% 100% 100% A T 87.11% 3% 12% 25% 57% TVM 47.55% 4% 4% 6% 14% FS 92.37% 26% 63% 66% 74% T ABLE III: Adversarial accuracy results for L 0 and L ∞ Attacks over 100 random samples almost all images were vulnerable ( 97% ). At the same time, CapsNet is reasonably robust to 1-threshold-safe (only 13% adversarial accuracy). ResNet is almost equally not robust throughout, with low robustness even when th = 3 , losing to CapsNet in robustness in all other values of th of the threshold attack. These preliminary tests also sho w that different networks hav e different robustness. This is not only regarding the type of attacks ( L 0 and L ∞ ) but also with the degree of attack (e.g., 1-threshold and 10-threshold attacks hav e very different results on CapsNet). C. Evaluating Learning and Defense Systems T able III extends the CMA-ES attacks on v arious neural networks: W ideResNet [33], DenseNet [34], ResNet [29], Network in Network (NIN) [35], All Conv olutional Network (AllCon v) [36], CapsNet [30], and LeNet [37]. W e also ev aluate with three contemporary defences: Adversarial training (A T) [14], T otal V ariance Minimization (TVM) [19], and Feature Squeezing (FS) [21]. Results in bold (Only for learning systems and not defensiv e systems) are the lo west adversarial accuracy and other results which are within a distance of ﬁv e from the lowest one. For CapsNet only 88 samples could be attacked with maximum th = 127 for L 0 Attack. T welve samples could not be overwhelmed when the th < 128 . Here, taking into account an existing variance of results, we consider results within ﬁve of the lo west to be equally good. If we consider the number of bold results for each of the neural networks, a qualitati ve measure of robustness CapsNet and AllCon v can be considered the most rob ust with ﬁ ve bold results. The third place in robustness achiev es only three bold results and consequently is far away from the prime performers. Regarding the adversarial training, it is easier to attack with the Few-Pix el Attack than with Threshold Attack. This result should derive from the fact that the adversarial samples used in adversarial training contained images from Projected Gradient Descent (PGD) Attack, which is L ∞ type of attack. Therefore, it suggests that given an attack bias that differs fr om the in variance bias used to train the networks, the attack can easily succeed . Regarding TVM, the attacks were less successful. W e trained a ResNet on TVM modiﬁed images and, albeit many trials with dif ferent hyper-parameters, we were able to craft a classiﬁer with at best 47 . 55% accuracy . This is a steep drop from the 92 . 37% accuracy of the original ResNet and happens because TVM was initially conceiv ed for Imagenet and did not scale well to CIF AR-10. Howe v er , as the original accuracy of the model trained with TVM is also not high; therefore, even with a small attack percentage of 24% , the resulting model accuracy is 35% . Attacks on Feature Squeezing had relativ ely high adversarial accuracy both L 0 and L ∞ attacks. Moreov er , both types of attacks had similar accuracy , rev ealing a lack of bias in the defence system. Notice that none of the neural networks was able to reduce low th attacks to zero. This illustrates that although robustness may differ between current neural networks, none of them can ef fectiv ely overcome e ven the lowest lev el of perturbation feasible. Moreover , since a th = 5 is enough to achiev e around 70% accuracy in many settings, this suggests that achieving 100% adversarial accuracy may depend more on a few samples which are harder to attack, such as samples far away from the decision boundary . Consequently , the focus on 100% adv ersarial accuracy rather than the amount of threshold might give preference to methods which set a couple of input projections far away from others without improving the accuracy ov erall. An example can be examined by making some input projections far away enough to make them harder to attack. The dif ference in the beha viour of L 0 and L ∞ Norm Attacks shows that the robustness is achie ved with some trade-offs. This further justiﬁes the importance of using both metrics to ev aluate neural networks. D. Evaluating Other Adversarial Attacks W e ev aluated our assessed neural networks further against well-known adversarial attacks such as Fast Gradient Method (FGM) [9], Basic Iterati ve Method (BIM) [6], Projected Gradient Descent Method (PGD) [14], DeepFool [38], and NewtonF ool [39]. Please, note that for FGM, BIM, PGD attacks  = 8 (Default V alue) ≈ th = 10 of L ∞ Attack on our robustness scales. While DeepF ool and NewtonFool do not explicitly control the robustness scale. T able IV compares the existing white-box attacks and black-box attacks with our proposed attacks. Notice that, although all the existing attacks are capable of fooling neural netw orks. W e notice some peculiar results, like DeepFool Attack, was less successful against the LeNet, which was most vulnerable 7 Adversarial Attacks WideResNet DenseNet ResNet NIN AllCon v CapsNet LeNet FGM 69% (159.88) 50% (120.03) 52% (124.70) 72% (140.46) 67% (155.95) 70% (208.89) 84% (152.37) BIM 89% (208.44) 52% (160.34) 55% (164.64) 74% (216.97) 69% (273.90) 82% (361.63) 89% (345.27) PGD 89% (208.49) 52% (160.38) 55% (164.64) 74% (216.96) 69% (274.15) 84% (370.90) 89% (357.34) DeepFool 60% (613.14) 60% (478.03) 58% (458.57) 59% (492.90) 51% (487.46) 87% (258.08) 31% (132.32) NewtonF ool 82% (63.13) 50% (53.89) 54% (51.56) 66% (54.78) 61% (61.05) 90% (1680.83) 84% (49.61) Few-Pix el ( L 0 ) Attack th = 1 20% (181.43) 20% (179.48) 29% (191.73) 28% (185.09) 24% (172.01) 29% (177.86) 61% (191.69) th = 3 54% (276.47) 50% (270.50) 63% (275.57) 62% (274.91) 49% (262.66) 43% (247.97) 89% (248.21) th = 5 75% (326.14) 68% (315.53) 79% (314.27) 81% (318.71) 67% (318.99) 52% (300.19) 96% (265.18) th = 10 91% (366.60) 81% (354.42) 90% (342.56) 93% (354.61) 81% (365.10) 63% (359.55) 98% (271.90) Threshold ( L ∞ ) Attack th = 1 30% (39.24) 38% (39.24) 43% (39.27) 23% (39.23) 23% (39.21) 13% (39.09) 47% (39.28) th = 3 92% (65.07) 69% (53.89) 74% (52.82) 81% (72.29) 72% (68.11) 34% (70.79) 96% (62.86) th = 5 95% (67.84) 72% (56.81) 77% (55.38) 85% (77.09) 76% (72.45) 72% (130.80) 99% (66.42) th = 10 98% (70.70) 78% (67.63) 83% (64.50) 90% (84.20) 79% (77.76) 97% (184.93) 100% (66.65) T ABLE IV: Adversarial accuracy of the proposed L 0 and L ∞ black-box Attacks used in the dual quality assessment and their comparison with other methods from the literature. The value in the brackets represents the Mean L 2 score of the adversarial sample with the original sample. The r esults wer e drawn by attacking a differ ent set of samples fr om pr evious tests. Therefor e the accuracy r esults may differ slightly fr om pr evious tables. Fig. 3: Adversarial accuracy from T able III across classes. The two diagrams at left and right are respectively L 0 and L ∞ attacks. The top diagrams used th = 10 while the bottom ones used th = 1 . to our proposed attacks (T able III). Moreov er , ResNet and DenseNet had much better robustness for the existing attacks compared to our attacks. The objectiv e of this article is not to propose better or more ef fectiv e attacking methods but rather to propose an assessment methodology , and its related duality conjecture (the necessity of ev aluating both L 0 and L ∞ Attacks). Howe v er , the proposed Threshold L ∞ Attack in the assessment methodology is more accurate than other attacks while requiring less amount of perturbation. The Threshold Attack requires less perturbation than the One-Pix el attack (only circa 12% of the amount of perturbation of the One-Pixel Attack th = 1 ) which was already considered one of the most extreme attacks needing less perturbation to fool neural networks. This sets up an ev en lower threshold to the perturbation, which is ine vitable to fool neural networks. Notice that, the behaviour of the existing attacks is similar to our Threshold L ∞ Attack (T able IV). This suggests that the current ev aluations of the neural networks focus on increasing the robustness based on L ∞ Norm. Howe ver , our study shows that behaviour of L 0 Norm differs from the L ∞ Norm (T able IV), and the robustness for the L ∞ Norm may not be sufﬁcient to study the robustness and vulnerabilities of the neural networks as a whole. 8 E. Dependency Of Proposed Adversarial Attacks On Classes W e further separated the adversarial accuracy (T able III) into classes (Figure 3). This is to e valuate the dependence of proposed adversarial attacks on speciﬁc classes, Figure 3 shows an already known feature that some classes are more natural to attack than others. For example, the columns for bird and cat classes are visually darker than frog and truck classes for all diagrams. This happens because classes with similar features and therefore, closer decision boundaries are more natural to attack. Interestingly , the Figure 3 reveals that neural networks tend to be harder to attack in only a few classes. This may suggest that these networks encode some classes far away from others (e.g., projection of the features of these classes into a dif ferent vector). Consequently , the reason for their relativ e robustness may lie on a simple construction of the decision boundary with a few distinct and sharply separated classes. F . Extr emely F ast Quality Assessment: T ransfer ability of Adversarial Samples If adversarial samples from one model can be used to attack different models and defences, it would be possible to create an ultra-fast quality assessment. Figure 4 shows that indeed, it is possible to qualitati vely assess a neural network based on the transferability of adversarial samples. Beyond being a faster method, the transferability of samples has the beneﬁt of ignoring any masking of gradients which makes hard to search but not to transfer . This shows that the vulnerability in neural networks is still there but hidden. Interestingly , the transferability is mostly independent on the type of attack ( L 0 or L ∞ ), with most of the previously discussed differences disappearing. There are some differences like L 0 attacks are less accurate than most of the L ∞ ones. This suggests that positions of pixel and their variance are relati vely more model-speciﬁc than small changes in the whole image. Generally speaking, transferability is a quick assessment method which, when used with many different types of adversarial samples, giv es an approximation of the model’ s robustness. This approximation is not better or worse but different. It dif fers from usual attacks because (a) it is not affected by how difﬁcult it is to search adversarial samples, taking into account only their existence, and (b) it measures the accuracy to commonly found adv ersarial samples rather than all searchable ones. Therefore, in the case of low th values, transferability can be used as a qualitati ve measure of robustness. Howe v er , its values are not equiv alent to or close to real adversarial accuracy . Thus, it serves only as a lower bound. G. Adversarial Sample Distribution of Quality Assessment T o understand the importance of the duality for the proposed quality assessment. W e analyse the distribution of our proposed attacks across samples. In some cases, the distribution of samples for L 0 and L ∞ can be easily veriﬁed by the dif ference in adversarial accuracy . For example, Model L 0 Attack L ∞ Attack W ideResNet 425.0 141.5 DenseNet 989.5 696.0 ResNet 674.0 575.5 NIN 528.0 364.0 AllCon v 1123.5 849.0 CapsNet 2493.0 404.5 LeNet 137.5 104.0 T ABLE V: Area under the curve (A UC) for both Few-Pix el ( L 0 ) and Threshold ( L ∞ ) black-box Attacks CapsNet is more susceptible to L ∞ than L 0 types of attacks while for adversarial training [14] the opposite is true (T able III). Naturally , adversarial training depends strongly on the adversarial samples used in training, Therefore, dif ferent robustness could be acquired depending on the type of adversarial samples used. Moreov er , the distribution sho ws here that e ven when adversarial accuracy seems close, the distribution of L 0 and L ∞ Attacks may differ . For example, the adversarial accuracy on ResNet for both L 0 and L ∞ with th = 10 differ by mere 2% . Howe ver , the distribution of adversarial samples shows that around 17% of the samples can only be attacked by either one of the attack types (Figure 5). Thus, the ev aluation of both L 0 and L ∞ are essential to verify the robustness of a given neural network or adversarial defence. Moreov er , this is true e ven when a similar adversarial accuracy is observed. H. Analysing effect of thr eshold th on learning systems T o ev aluate how networks behav e with the increase in threshold, we plot here the adversarial accuracy with the increase of th (Figure 6). These plots reveal an e ven more evident difference of behaviour for the same method when attacked with either L 0 or L ∞ norm of attacks. It shows that the curve inclination itself is different. Therefore, L 0 and L ∞ Attacks scale differently . From Figure 6, two classes of curves can be seen. CapsNet behav es on a class of its o wn while the other networks behav e similarly . CapsNet, which has an entirely different architecture with dynamic routing, shows that a very different robustness behaviour is achieved. LeNet is justiﬁably lower because of its lower accuracy and complexity . T o assess the quality of the algorithms in relation to their curves, the Area Under the Curve (A UC) is calculated by the trapezoidal rule. deﬁned as: A UC = ∆ n a  th 1 2 + th 2 + th 3 + . . . + th n − 1 + th n 2  where n a is the number of images attacked and th 1 , th 2 , . . . th n are different v alues of th threshold for a maximum of n = 127 . T able V shows a quantitativ e ev aluation of Figure 6 by calculating the Area Under the Curve (A UC). There is no network which is rob ust in both attacks. CapsNet is the most rob ust neural netw ork for L 0 attacks while AllCon v wins while being follo wed closely by other neural networks for L ∞ . Although requiring a lot more resources to be drawn, the curves here result in the same conclusion achie ved by T able 9 Fig. 4: Accuracy of adversarial samples when transferring from the a given source model (row) to a target model (column) for both L ∞ black-box Attacks (left) and L 0 black-box Attacks (right). The source of the adversarial samples is on the y-axis with the target model on the x-axis. The adv ersarial samples were acquired from 100 original images attacked with th varying mostly from one to ten. The maximum value of th is set to 127 . Fig. 5: Distribution of adversarial samples found on DenseNet (left) and ResNet (right) using th = 10 with both few-pixel ( L 0 ) and threshold ( L ∞ ) Attacks. III. Therefore, the previous results are a good approximation of the behaviour promptly . V I . C O N C L U S I O N S In this article, we propose a model agnostic dual quality assessment for adversarial machine learning, especially for neural networks. By inv estigating the various state-of-the-art neural networks as well as arguably the contemporary adversarial defences, it was possible to: (a) show that robustness to L 0 and L ∞ Norm Attacks differ signiﬁcantly , which is why the duality should be taken into consideration. (b) verify that current methods and defences, in general, are vulnerable even for L 0 and L ∞ black-box Attacks of low threshold th , and (c) validate the dual quality assessment with robustness lev el as a good and efﬁcient approximation to the full accuracy per threshold curve. Interestingly , the ev aluation of the proposed method (Threshold Attack) was shown to require surprisingly less amount of perturbation. This novel L ∞ black-box Attack based on CMA-ES required only circa 12% of the amount of perturbation used by the One-Pixel Attack while achieving similar accuracy . Thus, this article analyses the rob ustness of neural networks and defences by elucidating the problems as well as proposing solutions to them. Hopefully , the proposed dual quality assessment and analysis on current neural networks’ robustness will aid the dev elopment of more robust neural networks and hybrids alike. A C K N O W L E D G M E N T S This work was supported by JST , ACT -I Grant Number JP-50243 and JSPS KAKENHI Grant Number JP20241216. Additionally , we would like to thank Prof. Junichi Murata for the kind support without which it would not be possible to conduct this research. R E F E R E N C E S [1] C. e. a. Szegedy , “Intriguing properties of neural networks, ” in In ICLR . Citeseer , 2014. [2] A. Nguyen, J. Y osinski, and J. Clune, “Deep neural networks are easily fooled: High conﬁdence predictions for unrecognizable images, ” in Pr oceedings of the IEEE Conference on Computer V ision and P attern Recognition , 2015, pp. 427–436. [3] S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P . Frossard, “Univ ersal adversarial perturbations, ” in 2017 IEEE Confer ence on Computer V ision and P attern Recognition (CVPR) . Ieee, 2017, pp. 86–94. [4] T . B. Bro wn, D. Man ´ e, A. Roy , M. Abadi, and J. Gilmer , “ Adversarial patch, ” arXiv preprint , 2017. [5] J. Su, D. V . V argas, and K. Sakurai, “One pixel attack for fooling deep neural networks, ” IEEE Tr ansactions on Evolutionary Computation , vol. 23, no. 5, pp. 828–841, 2019. [6] A. Kurakin, I. Goodfellow , and S. Bengio, “ Adversarial examples in the physical world, ” arXiv preprint , 2016. [7] M. Sharif, S. Bhaga vatula, L. Bauer , and M. K. Reiter , “ Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition, ” in Proceedings of the 2016 A CM SIGSA C Conference on Computer and Communications Security . Acm, 2016, pp. 1528–1540. [8] A. Athalye and I. Sutskev er , “Synthesizing robust adversarial examples, ” in Icml , 2018. 10 Fig. 6: Adversarial accuracy per th for L 0 and L ∞ Attack. [9] I. J. Goodfellow , J. Shlens, and C. Szegedy , “Explaining and harnessing adversarial examples, ” arXiv preprint , 2014. [10] D. V . V argas and J. Su, “Understanding the one-pix el attack: Propagation maps and locality analysis, ” arXiv preprint , 2019. [11] N. Papernot, P . McDaniel, X. W u, S. Jha, and A. Swami, “Distillation as a defense to adversarial perturbations against deep neural networks, ” in 2016 IEEE Symposium on Security and Privacy (SP) . Ieee, 2016, pp. 582–597. [12] N. Carlini and D. W agner , “T o wards evaluating the robustness of neural networks, ” in 2017 IEEE Symposium on Security and Privacy (SP) . Ieee, 2017, pp. 39–57. [13] R. Huang, B. Xu, D. Schuurmans, and C. Szepesv ´ ari, “Learning with a strong adversary , ” arXiv preprint , 2015. [14] A. Madry , A. Makelov , L. Schmidt, D. Tsipras, and A. Vladu, “T o wards deep learning models resistant to adversarial attacks, ” in Iclr , 2018. [15] F . T ram ` er , A. Kurakin, N. Papernot, I. Goodfellow , D. Boneh, and P . McDaniel, “Ensemble adversarial training: Attacks and defenses, ” arXiv preprint arXiv:1705.07204 , 2017. [16] G. K. Dziugaite, Z. Ghahramani, and D. M. Roy , “ A study of the effect of jpg compression on adversarial images, ” arXiv preprint arXiv:1608.00853 , 2016. [17] T . Hazan, G. Papandreou, and D. T arlo w , P erturbations, Optimization, and Statistics . MIT Press, 2016. [18] N. Das, M. Shanbhogue, S.-T . Chen, F . Hohman, L. Chen, M. E. K ounavis, and D. H. Chau, “Keeping the bad guys out: Protecting and vaccinating deep learning with jpeg compression, ” arXiv pr eprint arXiv:1705.02900 , 2017. [19] C. Guo, M. Rana, M. Cisse, and L. v an der Maaten, “Countering adversarial images using input transformations, ” in Iclr , 2018. [20] Y . Song, T . Kim, S. Now ozin, S. Ermon, and N. Kushman, “Pixeldefend: Lev eraging generative models to understand and defend against adversarial examples, ” in Iclr , 2018. [21] W . Xu, D. Evans, and Y . Qi, “Feature squeezing: Detecting adversarial examples in deep neural networks, ” arXiv preprint , 2017. [22] X. Ma, B. Li, Y . W ang, S. M. Erfani, S. Wije wickrema, G. Schoenebeck, D. Song, M. E. Houle, and J. Bailey , “Characterizing adversarial subspaces using local intrinsic dimensionality , ” arXiv pr eprint arXiv:1801.02613 , 2018. [23] J. Buckman, A. Roy , C. Raffel, and I. Goodfellow , “Thermometer encoding: One hot way to resist adversarial examples, ” Iclr , 2018. [24] A. Athalye, N. Carlini, and D. W agner , “Obfuscated gradients giv e a false sense of security: Circumventing defenses to adv ersarial examples, ” in Icml , 2018. [25] J. Uesato, B. O’Donoghue, P . Kohli, and A. Oord, “ Adv ersarial risk and the dangers of evaluating against weak attacks, ” in International Confer ence on Machine Learning , 2018, pp. 5032–5041. [26] K. Grosse, P . Manoharan, N. Papernot, M. Backes, and P . McDaniel, “On the (statistical) detection of adversarial examples, ” arXiv preprint arXiv:1702.06280 , 2017. [27] N. Carlini and D. W agner, “ Adversarial examples are not easily detected: Bypassing ten detection methods, ” in Proceedings of the 10th ACM W orkshop on Artiﬁcial Intelligence and Security . Acm, 2017, pp. 3–14. [28] ——, “Magnet and” efﬁcient defenses against adversarial attacks” are not robust to adversarial examples, ” arXiv preprint , 2017. [29] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition, ” in Proceedings of the IEEE confer ence on computer vision and pattern r ecognition , 2016, pp. 770–778. [30] S. Sabour , N. Frosst, and G. E. Hinton, “Dynamic routing between capsules, ” in Advances in neural information pr ocessing systems , 2017, pp. 3856–3866. [31] R. Storn and K. Price, “Differential evolution–a simple and efﬁcient heuristic for global optimization over continuous spaces, ” Journal of global optimization , v ol. 11, no. 4, pp. 341–359, 1997. [32] N. Hansen, S. D. M ¨ uller , and P . Koumoutsakos, “Reducing the time complexity of the derandomized e volution strategy with cov ariance matrix adaptation (cma-es), ” Evolutionary computation , v ol. 11, no. 1, pp. 1–18, 2003. [33] S. Zagoruyko and N. Komodakis, “Wide residual networks, ” arXiv pr eprint arXiv:1605.07146 , 2016. [34] F . Iandola, M. Moskewicz, S. Karayev , R. Girshick, T . Darrell, and K. Keutzer , “Densenet: Implementing efﬁcient convnet descriptor pyramids, ” arXiv preprint , 2014. [35] M. Lin, Q. Chen, and S. Y an, “Network in network, ” arXiv preprint arXiv:1312.4400 , 2013. [36] J. T . Springenberg, A. Dosovitskiy , T . Brox, and M. Riedmiller, “Striving for simplicity: The all conv olutional net, ” arXiv preprint arXiv:1412.6806 , 2014. [37] Y . LeCun, L. Bottou, Y . Bengio, and P . Haffner , “Gradient-based learning applied to document recognition, ” Proceedings of the IEEE , vol. 86, no. 11, pp. 2278–2324, 1998. [38] S.-M. Moosavi-Dezfooli, A. Fawzi, and P . Frossard, “Deepfool: a simple and accurate method to fool deep neural networks, ” in Pr oceedings of the IEEE Conference on Computer V ision and P attern Recognition , 2016, pp. 2574–2582. [39] U. Jang, X. W u, and S. Jha, “Objectiv e metrics and gradient descent algorithms for adversarial examples in machine learning, ” in Pr oceedings of the 33r d Annual Computer Security Applications Confer ence . Acm, 2017, pp. 262–277. 11 [40] A. Krizhevsk y , G. Hinton et al. , “Learning multiple layers of features from tiny images, ” T ech. Rep., 2009. [41] M.-I. Nicolae, M. Sinn, M. N. Tran, B. Buesser , A. Rawat, M. W istuba, V . Zantedeschi, N. Baracaldo, B. Chen, H. Ludwig, I. Molloy , and B. Edwards, “ Adversarial robustness toolbox v1.1.0, ” CoRR , vol. 1807.01069, 2018. [Online]. A vailable: https://arxi v .org/pdf/1807.01069

Adversarial Robustness Assessment: Why both $L_0$ and $L_infty$ Attacks Are Necessary

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment