[论文解读] Tactics of Adversarial Attack on Deep Reinforcement Learning Agents
本论文提出两种对深度强化学习代理的对抗攻击策略:在子集步骤扰动观测的策略性定时攻击,以及通过规划扰动以引导代理达到目标状态的魅惑攻击,在 A3C 和 DQN 上对五个 Atari 游戏进行评估。
We introduce two tactics to attack agents trained by deep reinforcement learning algorithms using adversarial examples, namely the strategically-timed attack and the enchanting attack. In the strategically-timed attack, the adversary aims at minimizing the agent's reward by only attacking the agent at a small subset of time steps in an episode. Limiting the attack activity to this subset helps prevent detection of the attack by the agent. We propose a novel method to determine when an adversarial example should be crafted and applied. In the enchanting attack, the adversary aims at luring the agent to a designated target state. This is achieved by combining a generative model and a planning algorithm: while the generative model predicts the future states, the planning algorithm generates a preferred sequence of actions for luring the agent. A sequence of adversarial examples is then crafted to lure the agent to take the preferred sequence of actions. We apply the two tactics to the agents trained by the state-of-the-art deep reinforcement learning algorithm including DQN and A3C. In 5 Atari games, our strategically timed attack reduces as much reward as the uniform attack (i.e., attacking at every time step) does by attacking the agent 4 times less often. Our enchanting attack lures the agent toward designated target states with a more than 70% success rate. Videos are available at http://yenchenlin.me/adversarial_attack_RL/
研究动机与目标
- Understand vulnerability of deep RL agents to adversarial perturbations.
- Develop tactics that minimize perturbations while reducing agent rewards.
- Demonstrate effectiveness of attacks on state-of-the-art Deep RL algorithms (A3C, DQN).
- Explore planning-based attacks to steer agents toward designated states.
提出的方法
- Define strategically-timed attack using a relative action preference function to decide when to perturb.
- Craft perturbations with Carlini & Wagner method to flip the agent’s most preferred action to the least preferred one.
- Limit total attacks by a budget Γ and evaluate reward impact versus uniform attacks.
- Introduce enchanting attack combining a video-prediction model and a planning algorithm to lure the agent to a target state over H steps.
- Use a future-state predictor M to estimate s_{t+H}^M = M(s_t, A_{t:t+H}) and a sampling-based cross-entropy method to plan action sequences A_{t:t+H}.
- Evaluate on Atari games (MsPacman, Pong, Seaquest, Qbert, ChopperCommand) with A3C and DQN.
实验结果
研究问题
- RQ1Can deep RL agents trained with DQN and A3C be effectively attacked using minimally perturbed observations without triggering easy detection?
- RQ2How effective are strategically-timed attacks compared to uniform attacks in reducing accumulated rewards?
- RQ3Can a planning-based enchanting attack reliably steer an agent to a designated target state, and under what conditions?
- RQ4What defense considerations emerge for robustness against these two adversarial tactics?
主要发现
- Strategically-timed attacks can match the reward reduction of uniform attacks while perturbing observations at roughly 25% of time steps on average.
- DQN agents tend to be more vulnerable to strategically-timed attacks than A3C in most games examined.
- Enchanting attacks achieve more than 70% success in luring agents toward target states for several settings and games.
- The enchanting attack is less effective in environments with high stochasticity (e.g., multiple random enemies) due to prediction model inaccuracies.
- The study demonstrates two novel attack vectors against state-of-the-art Deep RL agents and discusses potential defenses.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。