QUICK REVIEW

[论文解读] Tactics of Adversarial Attack on Deep Reinforcement Learning Agents

Yen-Chen Lin, Zhang-Wei Hong|arXiv (Cornell University)|Mar 8, 2017

Adversarial Robustness in Machine Learning被引用 96

一句话总结

本论文提出两种对深度强化学习代理的对抗攻击策略：在子集步骤扰动观测的策略性定时攻击，以及通过规划扰动以引导代理达到目标状态的魅惑攻击，在 A3C 和 DQN 上对五个 Atari 游戏进行评估。

ABSTRACT

We introduce two tactics to attack agents trained by deep reinforcement learning algorithms using adversarial examples, namely the strategically-timed attack and the enchanting attack. In the strategically-timed attack, the adversary aims at minimizing the agent's reward by only attacking the agent at a small subset of time steps in an episode. Limiting the attack activity to this subset helps prevent detection of the attack by the agent. We propose a novel method to determine when an adversarial example should be crafted and applied. In the enchanting attack, the adversary aims at luring the agent to a designated target state. This is achieved by combining a generative model and a planning algorithm: while the generative model predicts the future states, the planning algorithm generates a preferred sequence of actions for luring the agent. A sequence of adversarial examples is then crafted to lure the agent to take the preferred sequence of actions. We apply the two tactics to the agents trained by the state-of-the-art deep reinforcement learning algorithm including DQN and A3C. In 5 Atari games, our strategically timed attack reduces as much reward as the uniform attack (i.e., attacking at every time step) does by attacking the agent 4 times less often. Our enchanting attack lures the agent toward designated target states with a more than 70% success rate. Videos are available at http://yenchenlin.me/adversarial_attack_RL/

研究动机与目标

Understand vulnerability of deep RL agents to adversarial perturbations.
Develop tactics that minimize perturbations while reducing agent rewards.
Demonstrate effectiveness of attacks on state-of-the-art Deep RL algorithms (A3C, DQN).
Explore planning-based attacks to steer agents toward designated states.

提出的方法

Define strategically-timed attack using a relative action preference function to decide when to perturb.
Craft perturbations with Carlini & Wagner method to flip the agent’s most preferred action to the least preferred one.
Limit total attacks by a budget Γ and evaluate reward impact versus uniform attacks.
Introduce enchanting attack combining a video-prediction model and a planning algorithm to lure the agent to a target state over H steps.
Use a future-state predictor M to estimate s_{t+H}^M = M(s_t, A_{t:t+H}) and a sampling-based cross-entropy method to plan action sequences A_{t:t+H}.
Evaluate on Atari games (MsPacman, Pong, Seaquest, Qbert, ChopperCommand) with A3C and DQN.

实验结果

研究问题

RQ1Can deep RL agents trained with DQN and A3C be effectively attacked using minimally perturbed observations without triggering easy detection?
RQ2How effective are strategically-timed attacks compared to uniform attacks in reducing accumulated rewards?
RQ3Can a planning-based enchanting attack reliably steer an agent to a designated target state, and under what conditions?
RQ4What defense considerations emerge for robustness against these two adversarial tactics?

主要发现

Strategically-timed attacks can match the reward reduction of uniform attacks while perturbing observations at roughly 25% of time steps on average.
DQN agents tend to be more vulnerable to strategically-timed attacks than A3C in most games examined.
Enchanting attacks achieve more than 70% success in luring agents toward target states for several settings and games.
The enchanting attack is less effective in environments with high stochasticity (e.g., multiple random enemies) due to prediction model inaccuracies.
The study demonstrates two novel attack vectors against state-of-the-art Deep RL agents and discusses potential defenses.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。