Tactics of Adversarial Attack on Deep Reinforcement Learning Agents

T actics of Adversarial Attack on Deep Reinf or cement Learning Agents Y en-Chen Lin 1 , Zhang-W ei Hong 1 , Y uan-Hong Liao 1 , Meng-Li Shih 1 , Ming-Y u Liu 2 , Min Sun 1 1 National Tsing Hua Uni versity , T aiw an 2 NVIDIA, Santa Clara, California, USA { yenchenlin@gapp, williamd4112@gapp, s102061137@m102, shihsml@gapp, sunmin@ee } .nthu.edu.tw mingyul@n vidia.com Abstract W e introduce two tactics, namely the strate gically- timed attack and the enchanting attack, to attack reinforcement learning agents trained by deep re- inforcement learning algorithms using adversarial examples. In the strategically-timed attack, the ad- versary aims at minimizing the agent’ s reward by only attacking the agent at a small subset of time steps in an episode. Limiting the attack activity to this subset helps prev ent detection of the attack by the agent. W e propose a nov el method to determine when an adv ersarial example should be crafted and applied. In the enchanting attack, the adversary aims at luring the agent to a designated target state. This is achie ved by combining a generativ e model and a planning algorithm: while the generativ e model predicts the future states, the planning algo- rithm generates a preferred sequence of actions for luring the agent. A sequence of adversarial e xam- ples is then crafted to lure the agent to tak e the pre- ferred sequence of actions. W e apply the proposed tactics to the agents trained by the state-of-the-art deep reinforcement learning algorithm including DQN and A3C. In 5 Atari games, our strate gically- timed attack reduces as much reward as the uniform attack (i.e., attacking at e very time step) does by at- tacking the agent 4 times less often. Our enchant- ing attack lures the agent toward designated tar get states with a more than 70% success rate. Example videos are av ailable at http://yenchenlin. me/adversarial_attack_RL/ . 1 Introduction Deep neural networks (DNNs), which can extract hierarchi- cal distrib uted representations from signals, are established as the de facto tool for pattern recognition, particularly for su- pervised learning. W e, as a generation, have witnessed a trend of f ast adoption of DNNs in v arious commercial systems per - forming image recognition [ Krizhevsk y et al. , 2012 ] , speech recognition [ Hannun et al. , 2014 ] , and natural language pro- cessing [ Sutske ver et al. , 2014 ] tasks. Recently , DNNs hav e also started to play a central role in reinforcement learning (RL)—a ﬁeld of machine learning research where the goal is to train an agent to interact with the en vironment for maxi- mizing its reward. The community has realized that DNNs are ideal function approximators for classical RL algorithms, because DNNs can extract reliable patterns from signals for constructing a more informed action determination process. For example, [ Mnih et al. , 2015 ] use a DNN to model the action–value function in the Q-learning algorithm, and [ Mnih et al. , 2016 ] use a DNN to directly model the policy . Rein- forcement learning research powered by DNNs is generally referred to as deep reinforcement learning (Deep RL). Howe ver , a constant question lingers while we enjo y using DNNs for function approximation in RL. Speciﬁcally , since DNNs are kno wn to be vulnerable to the adversarial example attack [ Szegedy et al. , 2014 ] , as a deep RL agent inherits the pattern recognition power from a DNN, does it also inherit its vulnerability to the adv ersarial examples? W e believ e the answer is yes and provide empirical e vidence in the paper . Adversarial attack on deep RL agents is different from adversarial attack on classiﬁcation system in several ways. Firstly , an RL agent interacts with the en vironment through a sequence of actions where each action changes the state of the environment. What the agent receiv ed is a sequence of correlated observations. For an episode of L steps, an adver - sary can determine whether to craft an adversarial e xample to attack the agent at each time step (i.e. there are 2 L choices). Secondly , an adversary to deep RL agents have different goals such as reducing the ﬁnal rew ards of agents or malev olently lure agents to dangerous states, which is dif ferent to an adv er- sary to classiﬁcation system that aims at lowering classiﬁca- tion accuracy . In this paper , we focus on studying adv ersarial attack speciﬁc on deep RL agents. W e argue this is important. As considering deep RL agents for controlling machines, we need to understand the vulnerability of the agents because it would limit their use in mission-critical tasks such as au- tonomous driving. Based on [ Kurakin et al. , 2016 ] , which showed that adv ersarial examples also e xist in the real w orld, an adv ersary can add maliciously-placed paint to the surface of a traf ﬁc stop to confuse an autonomous car . Ho w could we fully trust deep RL agents if their vulnerability to adversarial attacks is not fully understood and addressed? In a contemporary work, [ Huang et al. , 2017 ] proposes an adversarial attack tactic where the adversary attacks a deep RL agent at ev ery time step in an episode. W e refer to such a tactic as the uniform attack and argue it is preliminary . First, the uniform attack ignores the fact that the observ ations are correlated. Moreover , the spirit of adversarial attack is to ap- ply a minimal perturbation to the observation to av oid detec- tion. If the adversary perturbs the observation at e very time instance, it is more likely to be detected. A more sophisti- cated strate gy would be to attack at selectiv e time steps. F or example, as shown in Fig. 1, attacking the deep RL agent has no consequence when the ball is f ar away from the pad- dle. Ho wever , when the ball is close to the paddle, attacking the deep RL agent could cause it to drop the ball. There- fore, the adversarial attacks at dif ferent time instances are not equally ef fectiv e. Based on this observation, we propose the strategically-timed attack, which takes into account the num- ber of times an adversarial example is crafted and used. It intends to reduce the rew ard with as fewer adversarial exam- ples as possible. An adversarial example is only used when the attack is expected to be effecti ve. Our e xperiment results show that an adversary ex ercising the strategically-timed at- tack tactic can reduce the re ward of the state-of-the-art deep RL agents by attacking four times less often as comparing to an adversary e xercising the uniform attack tactic. In addition, we propose the enchanting attack for mali- ciously luring a deep RL agent to a certain state. While the strategically-timed attack aims at reducing the re ward of a deep RL agent, the enchanting attack aims at misguiding the agent to a speciﬁed state. The enchanting attack can be used to mislead a self-driving car controlled by a deep RL agent to hit a certain obstacle. W e implement the enchanting attack using a planning algorithm and a deep generativ e model. T o the best of our knowledge, this is the ﬁrst planning-based ad- versarial attack on a deep RL agent. Our experiment results show that the enchanting attack has a more than 70% success rate in attacking state-of-the-art deep RL agents. W e apply our adversarial attack to the agents trained by state-of-the-art deep RL algorithms including A3C [ Mnih et al. , 2016 ] and DQN [ Mnih et al. , 2015 ] on 5 Atari games. W e provide e xamples to ev aluate the ef fectiveness of our attacks. W e also compare the robustness of the agents trained by the A3C and DQN algorithms to these adversarial attacks. The contributions of the paper are summarized belo w: • W e study adversarial example attacks on deep RL agents trained by state-of-the-art deep RL algorithms including A3C and DQN. • W e propose the strategically-timed attack aiming at at- tacking a deep RL agent at critical moments. • W e propose the enchanting attack (the ﬁrst planning- based adversarial attack) aiming at maliciously luring an agent to a certain state. • W e conduct extensi ve experiments to ev aluate the vul- nerability of deep RL agents to the two attacks. 2 Related W ork Follo wing [ Szegedy et al. , 2014 ] , several adversarial exam- ple generation methods were proposed for attacking DNNs. Most of these methods generated an adversarial example via seeking a minimal perturbation of an image that can confuse the classiﬁer (e.g., [ Goodfellow et al. , 2015; Kurakin et al. , 2016 ] ). [ Moosavi-Dezfooli et al. , 2016 ] ﬁrst estimated linear decision boundaries between classes of a DNN in the image space and iteratively shifted an image toward the closest of these boundaries for crafting an adversarial e xample. While the existence of adv ersarial examples to DNNs has been demonstrated se veral times on v arious supervised learn- ing tasks, the existence of adversarial examples to deep RL agents has remained largely unexplored. In a contempo- rary paper , [ Huang et al. , 2017 ] proposed the uniform attack, which attacks a deep RL agent with adversarial examples at ev ery time step in an episode for reducing the re ward of the agent. Our work is different to [ Huang et al. , 2017 ] in sev- eral aspects, including 1) we introduce a strategically-timed attack, which can reach the same ef fect of the uniform attack by attacking the agent four times less often on a verage; 2) we also introduce an enchanting attack tactic, which is the ﬁrst planning-based adversarial attack to misguide the agent tow ard a target state. In terms of defending DNNs from adversarial attacks, se v- eral approaches were recently proposed. [ Goodfellow et al. , 2015 ] augmented the training data with adversarial exam- ples to improve DNNs’ robustness to adversarial examples. [ Zheng et al. , 2016 ] proposed incorporating a stability term to the objecti ve function, encouraging DNNs to generate similar outputs for various perturbed versions of an image. Defensiv e distillation is proposed in [ Papernot et al. , 2016b ] for training a network to defend both the L-BFGS attack in [ Szegedy et al. , 2014 ] and the fast gradient sign attack in [ Goodfellow et al. , 2015 ] . Interestingly , as anti-adversarial attack approaches were proposed, stronger adversarial attack approaches also emerged. [ Carlini and W agner , 2016 ] recently introduced a way to construct adv ersarial e xamples that is immune to v ar- ious anti-adv ersarial attack methods, including defensi ve dis- tillation. A study in [ Rozsa et al. , 2016 ] showed that more accurate models tend to be more robust to adversarial e xam- ples, while adversarial examples that can fool a more accu- rate model can also fool a less accurate model. As the study of adversarial attack to deep RL agents is still in its infanc y , we are unaw are of earlier works on the anti-adv ersarial attack to deep RL agents. 3 Adversarial Attacks In this section, we will ﬁrst revie w the adversarial example attack to DNN-based classiﬁcation systems. W e will then generalize the attack to deep RL agents and introduce our strategically-timed and enchanting attacks. 3.1 Preliminaries Let x be an image and f be a DNN. An adversarial exam- ple to the DNN can be crafted through solving the following optimization problem: min δ D I ( x, x + δ ) subject to f ( x ) 6 = f ( x + δ ) , (1) where D I is an image similarity metric. In words, it looks for a minimal perturbation, δ , of an image that can change the class assignment of the DNN to the image. s 25 s 84 𝜹 + = action taken: up action taken: down 𝛽 Figure 1: Illustration of the strategically-timed attack on Pong. W e use a function c to compute the preference of the agent in taking the most preferred action over the least preferred action at the current state s t . A large preference value implies an immediate reward. In the bottom panel, we plot c ( s t ) . Our proposed strate gically-timed attack launch an attack to a deep RL agent when the preference is greater than or equal to a threshold, c ( s t ) ≥ β (red-dash line). When a small perturbation is added to the observ ation at s 84 (where c ( s 84 ) ≥ β ), the agent changes its action from up to down and eventually misses the ball. But when the perturbation is added to the observation at s 25 (where c ( s 25 ) < β ), there is no impact to the rew ard. An RL agent learns to interact with the en vironment through the rewards signal. At each time step, it performs an action based on the observation of the en vironment for max- imizing the accumulated future re wards. The action determi- nation is through a policy π , which maps an observ ation to an action. Let the current time step be t , the goal of an RL algorithm is then to learn a policy that maximizes the accu- mulated future re wards R t = r t + r t +1 + ... + r L , where L is the length of an episode. In a deep RL algorithm, the policy π is modeled through a DNN. An adv ersary can attack an agent trained by the deep RL algorithm by perturbing the observations (through crafting an adversarial example) to mak e the agent take non- preferred actions that can result in reduction of the accumu- lated future rew ards. 3.2 Adversarial Attacks on RL In a recent paper, [ Huang et al. , 2017 ] propose the uniform attack tactic where the adversary attacks a deep RL agent at ev ery time step, by perturbing each image the agent observes. The perturbation to an image is computed by using the fast gradient sign method [ Goodfellow et al. , 2015 ] . The uniform attack tactic is regarded as a direct extension of the adver - sarial attack on a DNN-based classiﬁcation system, since the adversarial example at each time step is computed indepen- dently of the adversarial e xamples at other time steps. It does not consider se veral unique aspects of the RL prob- lem. For e xample, during learning, an RL agent is nev er told which actions to take b ut instead disco vers which actions yield the most reward. This is in contrast to the classiﬁcation problem where each image has a ground truth class. More- ov er, an adversarial attack to a DNN is considered a success if it makes the DNN outputs an y wrong class. But the success of an adv ersarial attack on an RL agent is measured based on the amount of reward that the adv ersary takes away from the RL agent. Instead of perturbing the image to mak e the agent takes any non-optimal action, we w ould like to ﬁnd a pertur- bation that makes the agent takes an action that can reduce most reward. Also, because the reward signal in many RL problems is sparse, an adversary need not attack the RL agent at ev ery time step. Our strategically-timed attack tactic de- scribed in Section 3.3 lev erages these unique characteristics to attack deep RL agents. Another unique characteristic of the RL problem is that each action taken by the agent inﬂuenced its future obser - vations. Therefore, an adv ersary could plan a sequence of adversarial examples to maliciously lure the agent toward a certain state that can lead to a catastrophic outcome. Our en- chanting attack tactic described in Section 3.4 lev erages this characteristic to attack RL agents. 3.3 Strategically-Timed Attack In an episode, an RL agent observes a sequence of obser- vations or states { s 1 , ..., s L } . Instead of attacking at every time step in an episode, the strategically-timed attack selects a subset of time steps to attack the agent. Let { δ 1 , ..., δ L } be a sequence of perturbations. Let R 1 be the e xpected return at the ﬁrst time step. W e can formulate the abov e intuition as an optimization problem as follows: min b 1 ,b 2 ,...,b L ,δ 1 ,δ 2 ,...,δ L R 1 ( ¯ s 1 , ..., ¯ s L ) ¯ s t = s t + b t δ t for all t = 1 , ..., L b t ∈ { 0 , 1 } , for all t = 1 , ..., L X t b t ≤ Γ (2) The binary variables b 1 , ..., b L denote when an adversarial ex- ample is applied. If b t = 1 , the perturbation δ t is applied. Otherwise, we do not alter the state. The total number of at- tacks is limited by the constant Γ . In words, the adversary minimizes the expected accumulated re ward by strategically attacking less than Γ << L time steps. The optimization problem in (2) is a mixed integer pro- gramming problem, which is dif ﬁcult to solve. Moreo ver , in an RL problem, the observation at time step t depends on all the previous observ ations, which makes the de velopment of a solver to (2) even more challenging since the problem size gro ws exponentially with L . In order to study adversar- ial attack to deep RL agents, we bypass these limitations and propose a heuristic algorithm to compute { b 1 , ..., b L } (solv- ing the when-to-attack problem) and { δ 1 , ..., δ L } (solving the how-to-attack problem), respectively . In the follo wing, we ﬁrst discuss our solution to the when-to-attack problem. W e then discuss our solution to the how-to-attack problem. When to attack W e introduce a relative action preference function c for solv- ing the when-to-attack problem. The function c computes the preference of the agent in taking the most preferred action ov er the least preferred action at the current state s t (similar to [ massoud Farahmand, 2011 ] ). The degree of preference to an action depends on the DNN policy . A large c value im- plies that the agent strongly prefers one action o ver the other . In the case of Pong, when the ball is about to drop from the top of the screen (see s 84 in Fig. 1), a well-trained RL agent would strongly prefer an up action over a down action. But when the ball is far away from the paddle (see s 25 in Fig. 1), the agent has no preference on any actions, resulting a small c value. W e describe ho w to design the relati ve action pref- erence function c for attacking the agents trained by the A3C and DQN algorithms below . For polic y gradient-based methods such as the A3C algo- rithm, if the action distribution of a well-trained agent is uni- form at state s t , it means that taking any action is equally good. But, when an agent strongly prefer a speciﬁc action (The action has a relativ e high probability .), it means that it is critical to perform the action; otherwise the accumulated rew ard will be reduced. Based on this intuition, we deﬁne the c function as c ( s t ) = max a t π ( s t , a t ) − min a t π ( s t , a t ) . (3) where π is the polic y network which maps a state–action pair ( s t , a t ) to a probability , representing the likelihood that the action a t is chosen. In our strategically-timed attack, the ad- versary attacks the deep RL agent at time step t when the relativ e action preference function c has a v alue greater a threshold parameter β . In other words, b t = 1 if and only if c ( s t ) ≥ β . W e note the β parameter controls ho w often it attacks the RL agent and is related to Γ . For value-based methods such as DQN, the same intuition applies. W e can con vert the computed Q-v alues of actions into a probability distrib ution ov er actions using the softmax function (similar to [ Huang et al. , 2017 ] ) with the temperature constant T . c ( s t ) = max a t e Q ( s t ,a t ) T P a k e Q ( s t ,a k ) T − min a t e Q ( s t ,a t ) T P a k e Q ( s t ,a k ) T . (4) How to attack T o craft an adversarial example at time step δ t , we search for a perturbation to be added to the observation that can change the preferred action of the agent from the originally (before applying the perturbation) most preferred one to the origi- nally least preferred one. W e use the attack method intro- duced in [ Carlini and W agner , 2016 ] where we treat the least- preferred action as the misclassiﬁcation target (see Sec. 4.1 for details). This approach allo ws us to le verage the output of a trained deep RL agent as cue to craft effecti ve adversarial example for reducing accumulated re wards. 3.4 Enchanting Attack The goal of the enchanting attack is to lure the deep RL agent from current state s t at time step t to a speciﬁed target state s g after H steps. The adv ersary needs to craft a series of adversarial examples s t +1 + δ t +1 , ..., s t + H + δ t + H for this attack. The enchanting attack is therefore more challenging than the strategically-timed attack. W e break this challenging task into two subtasks. In the ﬁrst subtask, we assume that we ha ve full control of the agent to take arbitrary actions at each step. Hence, the task is re- duced to planning a sequence of actions for reaching the tar- get state s g from current state s t . In the second subtask, we craft an adversarial example s t + δ t to lure an agent to take the ﬁrst action of planned action sequence using the method introduced in [ Carlini and W agner , 2016 ] . After the agent observes the adv ersarial example and takes the ﬁrst action planned by the adversary , the en vironment will return a ne w state s t +1 . W e progressi vely craft s t +1 + δ t +1 , ..., s t + H + δ t + H , one at a time, using the same procedure described in (Fig. 2) to lure the agent from state s t +1 to the target state s g . Next, we describe an on-line planing algorithm, which makes use of a next frame prediction model, for generating the planned action sequence. Future state prediction and evaluation W e train a video prediction model M to predict a future state giv en a sequence of actions based on [ Oh et al. , 2015 ] , which used a generativ e model to predict a video frame in the future: s M t + H = M ( s t , A t : t + H ) , (5) where A t : t + H = { a t , ..., a t + H } is the gi ven sequence of H future actions beginning at step t , s t is the current state, and s M t + H is the predicted future state. For more details about the video prediction model, please refer to the original paper . The series of actions A t : t + H = { a t , ..., a t + H } take the agent to reach the state s M t + H . Since the goal of the enchant- ing attack is to reach the tar get state s g , we can ev aluate the success of the attack based on the distance between s g and s M t + H , which is given by D ( s g , s M t + H ) . The distance D is re- alized using the L 2 -norm in our experiments. W e note that other metrics can be applied as well. W e also note that the state is giv en by the observed image by the agent. Sampling-based action planning W e use a sampling-based cross-entropy method ( [ Rubinstein and Kroese, 2013 ] ) to compute a sequence of actions to steer the RL agent toward our target state. Speciﬁcally , we sam- ple N action sequences of length H : { A n t : t + H } N n =1 , and rank video prediction model adversary unlabeled video possible sequence of actions adversarial example agent Environment s t s t+1 s t a t + target state training model planning (1) (2) (3) (4) input crafting 𝜹 Figure 2: Illustration of Enchanting Attack on Ms.Pacman. The blue panel on the right shows the ﬂow of the attack starting at s t : (1) action sequence planning, (2) crafting an adversarial example with a target-action, (3) the agent takes an action, and (4) environment generates the next state s t +1 . The green panel at the left depicts that the video prediction model is trained from unlabeled video. The white panel in the middle depicts the adversary starts at s t and utilize the prediction model to plan the attack. each of them based on the distance between the ﬁnal state obtained after performing the action sequence and the target state s g . After that, we keep the best K action sequences and reﬁt our categorical distributions to them. In our experi- ments, the hyper -parameter values are N = 2000 , K = 400 , and J = 5 . At the end of the last iteration, we set the sampled action sequence A ∗ t : t + H that results in a ﬁnal state that is closest to our target state s g as our plan. Then, we craft an adversar - ial e xample with the target-action a ∗ t using the method in- troduced in [ Carlini and W agner , 2016 ] . Instead of directly crafting the next adversarial example with target-action a ∗ t +1 , we plan for another enchanting attack starting at state s t +1 to be robust to potential f ailure in the previous attack. W e note that the state-transition model is different to the policy of the deep RL agent. W e use the state-transition model to propose a sequence of actions that we want the deep RL agent to follo w . W e also note that both the state-transition model and the future frame prediction model M are learned without assuming any information from the RL agent. 4 Experiments W e ev aluated our tactics of adversarial attack to deep RL agents on 5 dif ferent Atari 2600 games (i.e., MsPacman, Pong, Seaquest, Qbert, and ChopperCommand) using Ope- nAI Gym [ Brockman et al. , 2016 ] . These games represents a balanced collection. Deep RL agents can achieve an abov e- human level performance when playing Pong and a below- human lev el performance when playing Ms.Pacman. W e dis- cuss our experimental setup and results in details. Our imple- mentation will be released. 4.1 Experimental Setup For each game, the deep RL agents were trained using the state-of-the-art deep RL algorithms including the A3C and DQN algorithms. For the agents trained by the A3C algo- rithm, we used the same pre-processing steps and neural net- work architecture as in [ Mnih et al. , 2016 ] . For the agents trained by the DQN algorithm, we also used the same netw ork architecture for the Q-function as in the original paper [ Mnih et al. , 2015 ] . The input to the neural network at time t was the concatenation of the last 4 images. Each of the images was resized to 84 × 84 . The pixel v alue was rescaled to [0 , 1] . The output of the policy was a distribution over possible actions for A3C, and an estimate of Q values for DQN. Although sev eral existing methods can be used to craft an adversarial example (e.g., the fast gradient sign method [ Goodfellow et al. , 2015 ] , and Jacobian-based saliency map attack [ Papernot et al. , 2016a ] ), anti-adv ersarial attack measures were also disco vered to limit their impact [ Goodfellow et al. , 2015; Papernot et al. , 2016b ] . W e adopted an adversarial example crafting method proposed by [ Carlini and W agner , 2016 ] , which can break sev eral existing anti- adversarial attack methods. Speciﬁcally , it crafts an adv ersar- ial e xample by approximately optimizing (1) where the image similarity metric was given by L ∞ norm. W e early stopped the optimizer when D ( s, s + δ ) <  , where  is a small value set to 0 . 007 . The value of temperature T in Equation (4) is set to 1 in the experiments. 4.2 Strategically-Timed Attack For each game and for the agents trained by the DQN and A3C algorithms, we launched the strategically-timed attack using dif ferent β v alues. Each β value rendered a different at- tack rate, quantifying how often an adv ersary attacked an RL agent in an episode. W e computed the collected rewards by the agents under dif ferent attack rates. The results are sho wn in Fig. 3 where the y -axis is the accumulated re ward and the x -axis is the av erage portion of time steps in an episode that an adversary attacks the agent (i.e., the attack rate). W e sho w the lowest attack rate where the reward reaches the the re- ward of uniform attack. From the ﬁgure, we found that on av erage the strate gically-timed attack can reach the same ef- fect of the uniform attack by attacking 25% of the time steps in an episode. W e also found an agent trained using the DQN (a) Pong (b) Seaquest (c) MsPacman (d) ChopperCommand (e) Qbert Figure 3: Accumulated re ward (y-axis) v .s. Portions of time steps the agent is attacked (x-axis) of Strategically-timed Attack in 5 g ames. The blue and green curves correspond to results of A3C and DQN, respectively . A larger rew ard means the deep RL agent is more robust to the strategically-timed attack. (a) Pong (b) Seaquest (c) MsPacman (d) ChopperCommand (e) Qbert Figure 4: Success rate (y-axis) v .s. H steps in the future (x-axis) for Enchanting Attack in 5 g ames. The blue and green curv es correspond to results of A3C and DQN, respectiv ely . A lo wer rate means that the deep RL agent is more robust to the enchanting attack. algorithm w as more vulnerable than an agent trained with the A3C algorithm in most games except Pong. Since the A3C al- gorithm is known to perform better than the DQN algorithm in the Atari games, this result suggests that a stronger deep RL agent may be more robust to the adversarial attack. This ﬁnding, to some extent, echoes the ﬁnding in [ Rozsa et al. , 2016 ] , which suggested that a stronger DNN-based recogni- tion system is more robust to the adv ersarial attack. 4.3 Enchanting Attack The goal of the enchanting attack is to maliciously lure the agent to ward a target state. In order to a void the bias of deﬁn- ing target states manually , we synthesized target states ran- domly . Firstly , we let the agent to apply its polic y by t steps to reach an initial state s t and sav ed this state into a snap- shot. Secondly , we randomly sampled a sequence of actions of length H and consider the state reached by the agent af- ter performing these actions as a synthesized target state s g . After recording the tar get state, we restored the snapshot and run the enchanting attack on the agent and compared the nor- malized Euclidean distance between the tar get state s g and the ﬁnal reached state s t + H where the normalization constant was gi ven by the image resolution. W e considered an attack was successful if the ﬁnal state had a normalized Euclidean distance to the target state within a tolerance value of 1. T o make sure the e valuation was not affected by dif ferent stages of the game, we set 10 initial time step t equals to [0 . 0 , 0 . 1 , ..., 0 . 9] × L , where L was the av- erage length of the game episode played by the RL agents 10 times. F or each initial time step, we ev aluated different H = [1 , 5 , 10 , 20 , 40 , 60 , 80 , 100 , 120] . Then, for each H , we computed the success rate (i.e., number of times the adversary misguided the agent to reach the target state di vided by num- ber of trials). W e expected that a larger H would correspond a more difﬁcult enchanting attack problem. In Fig. 4, we show the success rate (y-axis) as a function of H in 5 games. W e found that the agents trained by both the A3C and DQN algo- rithms were enchanted. When H < 40 , the success rate was more than 70% in several g ames (except Seaquest and Chop- perCommand). The reason that enchanting attack was less ef- fectiv e on Seaquest and ChopperCommand was because both of the games include multiple random enemies such that our trained video prediction models were less accurate. 5 Conclusion W e introduced two no vel tactics of adv ersarial attack on deep RL agents: the strategically-timed attack and the enchanting attack. In ﬁve Atari games, we sho wed that the accumulated rew ards collected by the agents trained using the DQN and A3C algorithms were signiﬁcantly reduced when they were attacked by the strategically-timed attack even with just 25% of the time steps in an episode. Our enchanting attack com- bining video prediction and planning can lure deep RL agent tow ard maliciously deﬁned tar get states in 40 steps with more than 70% success rate in 3 out of 5 games. In the future, we plan to develop a more sophisticated strategically-timed attack method. W e also plan to improve video prediction accuracy of the generativ e model for improving the success rate of enchanting attack on more complicated games. An- other important direction of future work is de veloping de- fenses against adversarial attacks. Possible methods includ- ing augmenting training data with adv ersarial e xamples (as in [ Goodfellow et al. , 2015 ] , or training a subnetwork to detect adversarial input at test time and deal with it properly . Acknowledgements W e would lov e to thank anonymous revie wers, Chun-Y i Lee, Jan Kautz, Bryan Catanzaro, and W illiam Dally for their use- ful comments. W e also thank MOST 105-2815-C-007-083-E and MediaT ek for their support. References [ Brockman et al. , 2016 ] Greg Brockman, V icki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie T ang, and W ojciech Zaremba. Openai gym, 2016. [ Carlini and W agner , 2016 ] Nicholas Carlini and David W agner . T o wards ev aluating the rob ustness of neural networks. https://arxiv .org/abs/1608.04644 , 2016. [ Goodfellow et al. , 2015 ] Ian Goodfellow , Jonathon Shlens, and Christian Sze gedy . Explaining and harnessing adv er- sarial examples. In ICLR , 2015. [ Hannun et al. , 2014 ] A wni Hannun, Carl Case, Jared Casper , Bryan Catanzaro, Gre g Diamos, Erich Elsen, Ryan Prenger , Sanjee v Satheesh, Shubho Sengupta, Adam Coates, et al. Deep speech: Scaling up end-to-end speech recognition. https://arxiv .or g/abs/1412.5567 , 2014. [ Huang et al. , 2017 ] Sandy Huang, Nicolas Papernot, Ian Goodfellow , Y an Duan, and Pieter Abbeel. Adversarial attacks on neural network policies. https://arxiv .or g/abs/1702.02284 , 2017. [ Krizhevsk y et al. , 2012 ] Alex Krizhevsky , Ilya Sutskev er , and Geoffre y E Hinton. Imagenet classiﬁcation with deep con volutional neural networks. In NIPS , 2012. [ Kurakin et al. , 2016 ] Alex ey Kurakin, Ian J. Goodfellow , and Samy Bengio. Adv ersarial examples in the physical world. CoRR , abs/1607.02533, 2016. [ massoud Farahmand, 2011 ] Amir massoud Farahmand. Action-gap phenomenon in reinforcement learning. In J. Shawe-T aylor , R. S. Zemel, P . L. Bartlett, F . Pereira, and K. Q. W einberger , editors, Advances in Neural Infor - mation Processing Systems 24 , pages 172–180. Curran Associates, Inc., 2011. [ Mnih et al. , 2015 ] V olodymyr Mnih, Koray Kavukcuoglu, David Silver , Andrei A Rusu, Joel V eness, Marc G Belle- mare, Alex Graves, Martin Riedmiller , Andreas K Fidje- land, Georg Ostro vski, et al. Human-level control through deep reinforcement learning. Natur e , 518(7540):529–533, 2015. [ Mnih et al. , 2016 ] V olodymyr Mnih, Adria Puigdomenech Badia, and Mehdi Mirza. Asynchronous methods for deep reinforcement learning. In ICML , 2016. [ Moosavi-Dezfooli et al. , 2016 ] Seyed-Mohsen Moosavi- Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deep- fool: A simple and accurate method to fool deep neural networks. In CVPR , June 2016. [ Oh et al. , 2015 ] Junhyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard L Lewis, and Satinder Singh. Action-conditional video prediction using deep networks in atari games. In NIPS , 2015. [ Papernot et al. , 2016a ] Nicolas Papernot, Patrick Mc- Daniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. In Proceedings of the 1st IEEE Eur opean Symposium on Security and Privacy , 2016. [ Papernot et al. , 2016b ] Nicolas Papernot, Patrick Mc- Daniel, Xi W u, Somesh Jha, and Ananthram Sw ami. Distillation as a defense to adversarial perturbations against deep neural networks. In Pr oceedings of the 37th IEEE Symposium on Security and Privacy , 2016. [ Rozsa et al. , 2016 ] Andras Rozsa, Manuel G ¨ unther , and T errance E. Boult. T ow ards robust deep neural networks with B ANG. CoRR , abs/1612.00138, 2016. [ Rubinstein and Kroese, 2013 ] Reuven Y Rubinstein and Dirk P Kroese. The cr oss-entr opy method: a uniﬁed ap- pr oach to combinatorial optimization, Monte-Carlo simu- lation and machine learning . Springer Science & Business Media, 2013. [ Sutske ver et al. , 2014 ] Ilya Sutskev er, Oriol V inyals, and Quoc V Le. Sequence to sequence learning with neural networks. In NIPS , 2014. [ Szegedy et al. , 2014 ] Christian Szegedy , W ojciech Zaremba, Ilya Sutske ver , Joan Bruna, Dumitru Erhan, Ian Goodfellow , and Rob Fergus. Intriguing properties of neural networks. In ICLR , 2014. [ Zheng et al. , 2016 ] Stephan Zheng, Y ang Song, Thomas Leung, and Ian Goodfellow . Impro ving the robustness of deep neural netw orks via stability training. In CVPR’2016 , 2016.

Tactics of Adversarial Attack on Deep Reinforcement Learning Agents

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment