Technical Report on the CleverHans v2.1.0 Adversarial Examples Library

CleverHans is a software library that provides standardized reference implementations of adversarial example construction techniques and adversarial training. The library may be used to develop more robust machine learning models and to provide stand…

Authors: Nicolas Papernot, Fartash Faghri, Nicholas Carlini

T ec hnical Rep ort on the cleverha ns v2.1.0 Adv ersarial Exampl e s Library Nicolas P ap ernot ∗ 1,3 , F artash F aghri 5,3 , Nic holas Carlini 2,3 , Ian Go o dfello w † 3 , Reub e n F einman 4 , Alexey Kurakin 3 , Cihang Xie 6 , Y ash Sharma 7 , T om Bro wn 3 , Aurk o R o y 3 , Alexander Mat y ask o 8 , V ahid Behz adan 9 , Karen Ham bardzumy a n 10 , Zhish uai Zhang 6 , Yi-Lin Juang 11 , Zhi Li 5 , Ry an Sheatsley 1 , Abhibha v Garg 12 , Jonathan Uesato 13 , Willi Gierk e 14 , Yinpeng Dong 15 , Dav id Berthelot 3 , P aul Hendrick s 1 , Jonas Raub er 16 , Rujun Long 17 , and P atrick McDaniel ‡ 1 1 P ennsylv ania State Univ ersit y 2 UC Berk eley 3 Go ogle Brain 4 Syman tec 5 Univ ersit y of T oron to 6 Johns Hopkins 7 The Co oper Union 8 Nan yang T ec hnological Unive rsit y 9 Kansas State 10 Y erev aNN 11 NTUEE 12 I IT Delhi 13 MIT 14 Hasso Plattner Institute 15 National Tsing Hua Univ ersit y 16 IMPRS 17 0101.AI ∗ ngp5056@cse .psu.edu † goo dfello w@google.com ‡ mcdaniel@cse.psu.edu 1 Abstract cleverhans is a sof tw are library that pro vides standardized reference implementa tions of adversarial example construct i on tec h niques and ad- versarial tr aining . The library may b e used to devel op more robust ma- chine learning mod el s and to provide standardized b enc hmarks of mo dels’ p erf ormance in the adversarial setting. Benchmarks constructed without a standardized implemen tation of adversarial example construction are not comparable to each other, b eca use a go od result ma y indicate a robust mod el or it ma y merely indicate a w eak implementation of th e adversarial example construction procedure. This technical rep ort is structured as follo ws. Section 1 pro v ides an o verview of adv ersarial examples in machine learning and of the cleverha ns soft ware. Section 2 p res ents t he core functionalities of the library: namely the attac ks based on adversarial ex amples and defenses to improv e the ro- bustness of machine learning mo dels to t hese attacks. Section 3 describ es how to rep ort ben c hmark results using the library . Section 4 describ es the versioning system. 1 In tro duction Adv ersaria l examples are inputs cra f ted b y making slight p erturbations to legit- imate inputs with the inten t of misle a ding mac hine lear ning mo dels [1 8 ]. The per turbations are desig ned to b e sma ll in mag nitude, such that a hum an o b- server would not have difficulty pro cessing the res ulting input. In many cas es, the p erturbation r equired to deceive a machine learning model is so small that a human b eing may not be able to p erceive that anything has changed, or even so s mall that an 8-bit repre sen tation of the input v alues do es not capture the per turbation used to fo ol a model that accepts 32-bit inputs. W e in v ite read- ers unfamiliar with the co ncept to the detailed presentation in [18, 11, 1 7 , 4]. Although completely effective defenses hav e yet to be prop osed, the most suc- cessful to date is adversarial training [18, 11]. Differen t sources of a dversar ial examples used in the tr aining pr o cess can ma k e a dversar ial tra ining more effec- tive; as o f this writing, to the b est o f o ur knowledge, the most effective version of adversarial tr aining on Ima geNet is ensemble adversarial training [23] a nd the most effective version on MNIST is the bas ic iter ativ e method [12] applied to randomly c ho sen starting p oint s [14]. The cleve rhans librar y provides reference implementations of the attacks, which are intended for use for t wo pur p oses. First, machine learning develop- ers may constr uct robust mo dels by using adversarial training, which requires the constr uction of adversaria l examples dur ing the tr aining pro cedure. Sec- ond, w e encour age researchers who rep ort the accur a cy of their mo dels in the adversarial setting to use the standar dized reference implement ation provided by clever hans . Without a standa rd reference implementation, different bench- marks are not compar able—a benchmark rep orting hig h a ccuracy migh t indicate a more robust mo del, but it mig h t a lso indicate the use of a weak er atta ck im- plement ation. By using clever hans , re s earc hers c a n b e assur ed that a high 2 accuracy on a benchmark cor responds to a robust mo del. Implemen ted in T ensorFlow [1], cleve rhans is designed as a to ol to help de- velopers a dd defenses agains t adversarial examples to their models and b enc h- mark the robustness of their models to a dv ers a rial examples. The interface for clever hans is desig ne d to a ccept mo dels implemen ted using any mo del frame- work ( such as Ker as [9]) or implemented w itho ut any sp ecific mo del abstrac tion. The cleverh ans library is a collab oration is free, op en-source so f t ware, li- censed under the MIT license. The pro ject is av aila ble online through GitHub 1 . The main communication channel for developer s of the librar y is a mailing list, whose discussio ns are publicly av aila ble online 2 . 2 Core functionalities The library ’s pack age is org anized by mo dules. The most imp ortan t mo dules are: • attacks : contains the Attac k class , defining the interface used by all CleverHans attacks, as well as implementations of several sp ecific attacks. • model : contains the Mode l class, which is a very light weight class defining a simple in ter fa ce tha t mo dels should implement in order to b e compati- ble with At tack . C le verHans includes a Model implementation fo r Keras Sequen tial mo dels and examples of M odel implementations for T ensor- Flow mo dels that are not implement ed using any mo deling framework library . In the following, we desc ribe s ome of the resea rc h r esults b ehind the imple- men tations made in clever hans . 2.1 A t tac ks Adv ersaria l exa mple crafting a lgorithms implemented in clev erhans take a mo del, and an input, and return the corresp onding a dv ersa rial example. Here are the alg orithms cur ren tly implemented in the at tacks mo dule. 2.1.1 L-BF GS M ethod The L- BF GS metho d was introduced by Szegedy e t al. [1 8]. It a ims to solve the following b o x- c onstrained optimizatio n problem: minimize k x 0 − x k 2 2 such that C ( x ) = l where x ∈ [0 , 1] p (1) The computation is approximated by us ing box-constr ained L-BFGS optimiza- tion. 1 https:// github.c om/openaicleverhans 2 https:// groups.g oogle.com/group/cleverhans- dev 3 2.1.2 F ast Gradien t Sign Metho d The fast gradient sign method (F GSM) was intro duced by Go odfellow et al. [11]. The intuition b ehind the attack is to linea rize the cost function J used to tra in a mo del f a round the neighbo rhoo d o f the tra ining po in t ~ x that the adversary wan ts to force the misclas sification o f . The resulting adversarial example ~ x ∗ corres p onding to input ~ x is computed as fo llows: ~ x ∗ ← x + ε · ∇ ~ x J ( f , θ, ~ x ) (2) where ε is a pa rameter cont rolling the mag nitude of the pertur bation in tr oduced. Larger v a lues increa se the likelihoo d that ~ x ∗ will be miscla s sified by f , but make the p erturbation easier to detect by a human. The fast g radien t sign metho d is av a ilable by calling a ttacks.fg sm() The implemen tation defines the necessa ry graph elements and returns a tensor , which once ev aluated ho lds the v alue o f the a dversar ial example corres ponding to the input provided. The implemen tation is parameter ized by the par ameter ε introduced ab o ve. It is p ossible to configur e the metho d to clip adversaria l examples so that they are co ns trained to be par t o f the exp ected input doma in range. 2.1.3 Carlini-W agner Attac k The Ca rlini-W agner (C&W) attack was int ro duced by Carlini et al. [5]. Inspir ed by [18], the a uthors for m ulate finding adversarial e xamples as an optimizatio n problem; find some small change δ that can b e made to an input x that will change its classification, but so that the result is s till in the v alid range. They instantiate the distance metr ic with a n L p norm, define a success function f such that f ( x + δ ) ≤ 0 if and only if the mo del misclass ifies , and minimize the sum with a trade - off co nstan t ‘c’. ‘c’ is chosen by mo dified binary sea rc h, the box co nstrain t is r e solv ed by a pplying a c hange-of-v ar iables, and the Adam [2] optimizer is used to solve the optimizatio n instance. The attack has been shown to b e quite p o werful [5, 6], how ever this p o wer comes at the cost of sp eed, as this attack is often muc h slower than others. The attack ca n b e sp ed up by fixing ‘c’ (instea d o f p erforming mo dified bina ry search). The Car lini- W a g ner attack is av ailable b y instantiating the attack ob ject with atta cks.Carlin iWagne rL2 and then calling the gene rate() function. This genera tes the sym b olic gra ph and returns a tensor , which once ev aluated holds the v alue of the adversarial example corresp onding to the input provided. As the na me sugges ts , the L p norm used in the implementation is L 2 . The attack is controlled by a num b er o f parameters, namely the confidence, which defines the ma r gin b et ween logit v alues necess a ry to succeed, the learning rate (step-size), the nu m be r of binar y se arc h steps, the num b er of iterations p er binary sea rc h step, and the initial ‘c’ v a lue. 4 2.1.4 Elastic Net M ethod The Ela stic Net Me tho d (EAD) w a s introduced b y Chen et al. [7]. Inspired by the C&W attack [5], finding adversaria l exa mples is formulated as an optimiza- tion problem. The sa me los s function a s used b y the C&W attack is adopted, how ever instead of p erforming L 2 regular iz ation, elastic-net regular ization is per formed, with β controlling the trade-o ff betw ee n L 1 and L 2 . The iterative shrink age-thr e sholding algorithm (IST A) [3]. IST A can b e vie wed as a regu- lar fir s t-order optimization algorithm with an additional shrink ag e-thresholding step on eac h iteration. Notably , the C&W L 2 attack b e comes a sp ecial case of the EAD formula- tion, with β = 0. How ever, o ne can view EAD as a r obust version of the C&W metho d, as the IST A o peration shrink s a v alue o f the a dversar ial exa mple if the deviation to the orig inal input is grea ter than β , a nd leav es the v alue unchanged if the deviation is less than β . Empir ical r esults supp ort this claim, demonstrat- ing the attack’s ability to bypass strong detection s c hemes and s ucceed against robust adversarially tr ained mo dels while still pro ducing adversarial examples with minimal visual distortion [7, 22, 21, 13]. The Elastic Net Metho d is a v ailable by insta n tiating the attack ob ject with attack s.Elastic N e t Method and then calling the g enerate() function. This generates the sym bo lic gra ph a nd returns a tensor, which o nce ev aluated ho lds the v alue of the adversarial e x ample corresp onding to the input provided. The attack is controlled by a num b er of pa r ameters, most o f which are shared with the C&W attack, namely the co nfidence, which defines the margin betw een log it v alues necessa ry to succeed, the learning rate (step-size ), the num b er of binary search steps, the n umber of iteratio ns p er binar y search step, and the initial ‘c’ v alue. Additional para meters include β , the elastic-ne t reg ularization co nstan t, and the decision rule, whether to choo se successful adversaria l exa mples with minimal L 1 or elas tic-net distortio n. 2.1.5 Basic Iterativ e Metho d The basic iterative metho d (BIM) was intro duced by Kur a kin et al. [12], and extends the “ f ast” gradient metho d by applying it multiple times with small step size, clipping v alues o f intermediate results after each step to ensure that they ar e in an ε -neigh b orho od of the original input. The basic iterative metho d is av a ilable by insta n tiating the attack ob ject with att acks.Basic IterativeMethod and then calling the ge nerate() func- tion. This genera tes the symbolic graph and r eturns a tensor , which once ev al- uated holds the v alue of the adversarial example cor responding to the input provided. The attack is parameter ized by ε , alike the fast gr adien t metho d, but a lso by the step-size for each a tt ack iteration and the n um ber of attack iterations. 5 2.1.6 Pro jected Gradient Descent The pro jected g r adien t descent (PGD) attack was in tro duced by Madry et al. [1 4 ]. The autho r s state that the bas ic iterative metho d (BIM) [12] is es - sentially pro jected gra dien t descent on the negative lo ss function. T o explor e the los s la ndscape further , P GD is re-star ted from many p oin ts in the L ∞ balls around the input examples. PGD is av ailable by instantiating the attack ob ject with att acks.Madry EtAl and then calling the generat e() function. This ge nerates the symbolic gr aph and returns a tensor, which once ev aluated holds the v alue of the a dv ersa rial example corres p onding to the input provided. PGD sha r es man y par ameters with BIM, suc h as ε , the s tep-size for each attack iteration, and the n umber of attack iterations. An additiona l parameter is a b oole an which sp ecifies whether or not to add an initial random p erturbation. 2.1.7 Momentum Iterativ e Me thod The momentum iterative metho d (MIM) was introduced by Dong et al. [10]. It is a technique for accele r ating gr adien t descent algorithms by a ccum ulating a velocity vector in the gradient direction of the loss function acr oss iter ations. BIM with incorp orated momentum applied to an ensemble o f mo dels w on first place in b oth the NIPS 2 017 Non-T a rgeted and T a r geted Adversarial Attack Comp etitions [16 ]. The momentum iterative method is av ailable by instantiating the attack ob- ject with att acks.Momen tumIterativeMethod and then calling the g enerate() function. This generates the symbolic gra ph and returns a tensor , whic h once ev aluated holds the v alue of the adversaria l ex a mple co rrespo nding to the in- put provided. MIM shares many parameter s with BIM, such as ε , the step-size for each attack itera tion, and the num b er of attack iterations. An additiona l parameter is a decay factor whic h can b e applied to the momentum ter m. 2.1.8 Jacobian-based Saliency Map Approach The Jaco bian-based saliency map approa c h (JSMA) was introduced by Paperno t et al. [1 7]. The metho d iteratively perturbs featur es of the input that hav e large adversarial saliency scores . In tuitiv ely , this score reflects the adversarial g o al of taking a s ample aw ay from its source class tow ards a chosen targ et class. First, the a dv ersa ry computes the Jacobian of the mo del and e v a luates it in the current input: this returns a matrix h ∂ f j ∂ x i ( ~ x ) i i,j where comp onent ( i , j ) is the der iv ative of cla ss j w ith resp ect to input fea tur e i . T o compute the adversarial saliency map, the adversary then computes the following for each input feature i : S ( ~ x, t )[ i ] = ( 0 if ∂ f t ( ~ x ) ∂ ~ x i < 0 o r P j 6 = t ∂ f j ( ~ x ) ∂ ~ x i > 0  ∂ f t ( ~ x ) ∂ ~ x i     P j 6 = t ∂ f j ( ~ x ) ∂ ~ x i    otherwise (3) 6 where t is the targ et cla ss that the adversary wan ts the machine le a rning mo del to assig n. The adversary then selects the input feature i with the largest saliency score S ( ~ x , t )[ i ] and increa ses its v alue 3 . The pro cess is repe a ted un til misclas- sification in the targe t class is ac hieved or the maximum num b er of p erturbed features has been reached. In cle verhans , the Ja cobian-based saliency map approa c h may b e called with atta cks.jsma() . The implemen tation retur ns the adversaria l example directly , as well as whether the target clas s was achiev ed or not, and how many input features w er e p erturbed. 2.1.9 DeepF o ol DeepF o ol was in tr oduced by Mo osavi-Dezfooli et al. [1 5 ]. Unlike most o f the attacks describ ed here, it ca nnot be used in the targ e ted case, where the attack er sp ecifies what targ et class the mo del s hould clas s ify the adversarial example as. It can o nly b e used in the non-ta r geted ca se, where the attack er can o nly ensure that the the mo del classifies the adversaria l example in a class different fro m the or ig inal. Inspired by the fact that the corres ponding separa ting hyperplanes in linear classifiers indicate the decision b oundaries of ea c h class, DeepF o ol aims to find the least distortion (in terms of euclidean distance) leading to misclassification by pro jecting the input example to the closest separating hyperpla ne. An a p- proximate itera tive a lgorithm is pr oposed for attacking neur al netw o rks in order to tackle its inherent nonlinear ities. DeepF o ol is av ailable by instantiating the attack ob ject with at tacks.Deep Fool and then calling the generat e() function. This ge nerates the symbolic gr aph and returns a tensor, which once ev aluated holds the v alue of the a dv ersa rial example cor responding to the input provided. DeepF o ol has a few parameters, such as the num b er of clas ses to test agains t, a termina tion criter ion to pr ev ent v anishing up dates, and the maximu m num b er of iterations. 2.1.10 F eature Adv ersaries F eature Adversaries were in tro duced by Sab our et al. [20]. Instead of solely co n- sidering a dversar ies which disrupt c lassification, termed lab el adversaries , the authors co nsidered adversarial examples which are confused with other exam- ples not just in class lab el, but in their in ternal representations as well. Suc h examples ar e generated by fe atur e adversaries . Such featur e adversarial e x amples a re g enerated by minimizing the euclidean distance b et ween the internal deep representation (at a sp ecified layer) while constraining the distance b et ween the input and adversarial example in terms of L ∞ to be les s than δ . The optimization is conducted using b o x - constrained L-BFGS. 3 In the ori gina l pap er and the cleverhans implementa tion, input features are s e lected by pairs using the same heuristic. 7 F eature adversaries a re av ailable by ins ta n tiating the atta ck ob ject with attack s.FastFea t u r eAdversaries and then ca lling the genera te() function. This genera tes the sym b olic gra ph and returns a tensor , which once ev aluated holds the v alue of the adversarial example corresp onding to the input provided. The implementation is par ameterized by the following set of pa r ameters: ε , the step-size for each attack iteration, the num b er of attack iteratio ns, and the lay er to target. 2.1.11 SPSA Sim ultaneous p erturbation sto chastic approximation (SPSA) was intro duced by Uesato et al. [2 4 ]. SPSA is a gr adien t-free optimization metho d, which is useful when the mo del is non-differentiable, or more g enerally , the gr adien ts do not po in t in useful dir ections. Gr adien ts are approximated us ing finite difference estimates [8] in random dire c tio ns. SPSA is av ailable b y instantiating the attack ob ject with att acks.SPSA and then calling the gene rate() function. This genera tes the s ym b olic gra ph and returns a tensor , which o nce ev aluated holds the v a lue of the a dversar ial example corres p onding to the input provided. The implementation is para meterized by the following set of parameter s: ε , the num b er of o ptimization steps, the learning r ate (step-size), and the perturba tion size used for the finite difference approximation. 2.2 Defenses The intuition behind defenses against a dv ers a rial examples is to make the mo del smo other by limiting its sensitivity to small p erturbations of its inputs (and therefore making adversarial examples harder to cra ft ). Since a ll defenses cur - rently pro posed mo dify the lea r ning a lgorithm used to train the mo del, we implemen t them in the mo dules of cleve rhans that contain the functions used to train mo dels. In mo dule uti ls tf , the following defenses are implemen ted. 2.2.1 Adv ersarial training The intuition b ehind a dv ersa rial training [18, 11] is to inject adversarial e x- amples during training to improv e the g eneralization of the machine learning mo del. T o achiev e this effect, the training function tf model train() imple- men ted in mo dule u tils tf can b e given the tensor definition for a n adversaria l example: e.g., the one returned b y the metho d des cribed in Section 2.1.2. When such a tenso r is g iv en, the tr aining algor ithm mo difies the loss function used to optimize the mo del pa rameters: it is in that c ase defined as the av era ge be- t ween the lo s s for predictions on legitimate inputs and the lo ss for predictions made on adversarial e xamples. The rema inder of the training algorithm is left unch anged. 8 3 Rep orting Benc h ma rk Results This sec tion provides instructions for how to preprar e and rep ort b ench mark results. When compar ing against previously published b enc hmarks , it is b est to to use the sa me version of clev erhans as was used to pro duce the previo us b enc h- marks. This minimizes the p ossibilit y that an undetected change in be ha vior betw een versions co uld cause a difference in the o utput o f the b enchm ark r esults. When re p orting new results that are no t directly co mpared to previous work, it is b est to use the most recent versioned release of clev erhans . In all c a ses, it is imp o rtan t to rep ort the version num b er of clev erhans . In addition to this information, one should als o rep ort whic h attack metho ds were used, and the v a lue s of any configuratio n parameters used for these attacks. F or example, you might r eport “W e benchmarked the r o bustness o f o ur metho d to adversar ial attack using v2.1 .0 o f CleverHans (P ap ernot et al. 2 018). On a tes t set mo dified by fgsm with eps o f 0.3, we o btained a test set accur a cy of 97.9%.” The library do es not provide sp ecific test da ta sets or data pr eprocess ing. End user s ar e r esponsible for appro priately pr eparing the data in their sp ecific application ar eas, and for rep orting sufficient information abo ut the data pre- pro cessing a nd mo del family to mak e b enc hmarks appropriately compara ble. 4 V ersioning Because o ne of the goals of cleve rhans is to provide a basis for repro ducible benchmarks, it is imp ortant that the version num ber s provide useful informa- tion. The library uses sema n tic versioning, 4 meaning that version num b ers take the form o f MAJO R.MIN OR.P A TCH. The P A TCH n umber increments whenever backwards-compatible bug fixes are made. F or the pur p ose of this librar y , a bug is not consider ed backwards- compatible if it changes the results of a b enc hmark test. The MINOR num b er increments whenever new features are added in a ba ckwards-compatible manner. The MAJOR n umber incremen ts whenever a n interface changes. An y time a bug in CleverHans affects the accuracy of any p erformance num- ber rep orted as a b ench mark r esult, we consider fixing the bug to constitute an API change (to the in terface mapping from the sp ecification of a benchmark exp erimen t to the r eported p erformance) and increment the MAJO R version nu m ber when we make the next rele ase. F or this reaso n, when writing aca - demic ar ticles, it is imp ortan t to compare Clev erHans b enc hmark results that were pro duced with the same MAJO R version num b er. Relea s e notes a c c om- panying each revision indicate whether an increment to the MAJOR num ber inv alidates ea rlier b enc hmar k res ults or not. Release no tes for each v e r sion are av ailable at h ttps://git hub.com/tensorflow/cleverhans/releases 4 http://s emver.or g/ 9 5 Ac kno wledgmen ts The format of this rep ort was in part inspired by [19]. Nicolas Pap ernot is s up- po rted by a Go ogle PhD F ellowship in Sec urit y . Rese a rc h was spo nsored by the Army Research Lab oratory and was accomplished under Co oper ativ e Agree- men t Number W911NF-13-2 -0045 (ARL Cyb er Secur it y CRA). The views and conclusions co ntained in this do cumen t are thos e of the authors and should not be interpreted as representing the official p olicies, either expressed or im- plied, of the Army Resear c h Lab oratory or the U.S. Governmen t. The U.S. Gov ernment is authorized to repro duce and distr ibut e reprints for Gov ernment purp oses no t withstanding any copyright notation here on. References [1] Martın Abadi, Ashish Agarwal, Paul Barham, Eugene Brev do, Zhifeng Chen, Craig Citro, Greg S Corra do, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. T ensorflow: Lar ge-scale machine learning on hetero geneous distributed sys tems. arXiv pr eprint arXiv:1603 .0446 7 , 201 6. [2] Diedrik P Kingma and Jimmy B a. Adam: A Method for Sto c hastic O pti- mization. arXiv pr eprint arXiv:1412.6 980 , 2 014. [3] Amir Beck a nd Mar c T eb oulle. A fas t iterative shrink age-thresho lding al- gorithm for linea r inv erse problems. S IAM journal on imagi ng scie nc es 2(1):183– 202. 20 09. [4] Battista Bigg io , Igino Co r ona, Da vide Maiorca, Blaine Nelso n, Nedim ˇ Srndi´ c, Pa vel Laskov, Gio rgio Gia cin to, and F abio Roli. Ev asion attacks against ma chine lea rning at test time. In Joint Eur op e an Confer enc e on Machine L e arning and Know le dge Disc overy in Datab ases , pages 38 7–402. Springer, 2 013. [5] Nic ho la s Carlini a nd Da vid W agner. T owards ev a luating the ro bustness of neural netw o rks. arXiv pr eprint arXiv:16 08.04644 , 2016. [6] Nic ho la s Carlini and Da v id W agner. Adversaria l exa mples ar e not easily de- tected: Bypassing ten detection metho ds. arXiv pr eprint arXiv:1705. 07263 , 2017. [7] Pin-Y u Chen, Y ash Sha r ma, Huan Zhang, Jinfeng Yi, a nd Cho-Jui Hsieh. EAD: Elastic- ne t attacks to deep neural netw ork s via a dv ersa rial examples . arXiv pr eprint arXiv:1709. 04114 , 20 1 7. [8] Pin-Y u Chen, Huan Zhang, Y as h Shar ma, Jinfeng Yi, and Cho-Jui Hsieh. ZOO: Z eroth order optimization based black-b o x a ttacks to deep neura l net- works without training s ubs tit ute models arXiv pr eprint arXiv:1708. 03999 , 2017. 10 [9] F ran¸ cois Chollet. Keras. GitHub r ep ository: https://github. c om/fchollet / ker as , 2 015. [10] Yinp eng Do ng, F angzhou Liao, Tianyu Pang, Hang Su, J un Zh u, Xiaolin Hu, and Jia nguo Li. Bo osting adversarial attacks with momen tum. arXiv pr eprint arXiv:1710.0 6081 , 20 17. [11] Ia n J Go o dfello w, J onathon Shlens, and Christian Szeg e dy . Explaining and harnessing adversaria l examples. arXiv pr eprint arXiv:1412.65 72 , 2014. [12] Alexey Kurak in, Ian Go odfellow, a nd Samy Bengio . Adversaria l ex amples in the ph ys ical world. arXiv pr eprint arXiv:1607.0 2533 , 2016. [13] Pei-Hsuan Lu, Pin-Y u Chen, Kang-C heng Chen, and Chia-Mu Y u. On the limitation o f MagNet defense ag ainst L1-based adversar ia l examples arXiv pr eprint arXiv:1805.0 0310 , 20 18. [14] Aleks a nder Madry , Aleksanda r Ma kelov, Ludwig Schmidt, Dimitris Tsipras, and Adria n Vladu. T ow a rds deep learning mo dels r esistan t to adversarial a tta cks. arXiv pr eprint arXiv:1706.06 083 , 2017. [15] Seyed-Mohsen Mo osavi-Dezfooli, Alhussein F awzi, and Pascal F ross a rd. DeepF o ol: a s imple a nd accur ate metho d to fo ol deep neur a l netw o rks arXiv pr eprint arXiv:1511. 04599 , 20 1 5. [16] Alexey Kurakin, Ian Go odfellow, Samy Bengio, Yinpeng Dong, F ang zhou Liao, Ming L ia ng, Tianyu Pang, Jun Z h u, Xiaolin Hu, Ciha ng Xie, Jia n yu W ang, Zhish uai Zha ng, Zho u Ren, Alan Y uille, Sangxia Hua ng , Y ao Zhao, Y uzhe Z hao, Zho nglin Han, Junjia jia Long, Y e r k ebulan Berdib eko v, T akuya Akiba, Sety a T okui, a nd Moto ki Ab e. Adversarial attacks and defences co mpetition. arXiv pr eprint arXiv:1804. 00097 , 2018 . [17] Nicola s Pap ernot, Patrick McDaniel, Somesh Jha, Matt F redrikso n, Z Berk ay Celik, and Ananthram Swami. The limitations of deep learn- ing in adversarial settings. In 2016 IEEE Eur op e an Symp osium on Se curity and Privacy (Eur oS&P) , pages 372– 387. IEE E, 20 16. [18] Chr istian Szegedy , W o jciec h Zar e m ba, Ilya Sutskev er, Jo an Br una, Du- mitru Erha n, Ian Go odfellow, a nd Rob F ergus . Int riguing prop erties of neural netw o rks. arXiv pr eprint arXiv:13 12.6199 , 2013. [19] Thea no Developmen t T eam. Theano: A Python framework for fast compu- tation of mathematica l expressio ns. arXiv e-prints , abs/1 605.02688 , May 2016. [20] Sa r a Sa bour, Y anshuai Ca o , F artash F aghri, and David J Fleet. Adversarial manipulation o f deep re pr esen tations . arXiv pr eprint arXiv:1511.0 5122 , 2015. 11 [21] Y a sh Sharma a nd Pin-Y u Chen. Bypassing feature squeezing by increasing adversary stre ngth. arXiv pr eprint arXiv:1803 .09868 , 201 8. [22] Y a sh Sha rma and Pin-Y u Chen. A ttacking the Madry defense model with L1-based a dv ersa rial examples . arXiv pr eprint arXiv:17 10.10733 , 2 017. [23] Flor ian T ram` er, Alexey Kura kin, Nicolas Papernot, Dan Bo neh, and Patric k McDaniel. Ensemble adversarial training : Attac k s and defenses. arXiv pr eprint arXiv:1705. 07204 , 20 1 7. [24] J o nathan Uesato, Brendan O’Donog h ue, Aaro n v an den Oord, and P ush- meet Ko hli. Adversarial risk and the dangers of ev aluating against weak attacks arXiv pr eprint arXiv:1802.0 5666 , 20 18. 12

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment