QUICK REVIEW

[論文レビュー] Tactics of Adversarial Attack on Deep Reinforcement Learning Agents

Yen-Chen Lin, Zhang-Wei Hong|arXiv (Cornell University)|Mar 8, 2017

Adversarial Robustness in Machine Learning被引用数 96

ひとこと要約

本論文は深層強化学習エージェントに対する二つの敵対的攻撃戦術を紹介する。戦略的にタイミングを合わせて観測を一部のステップで摂動する攻撃と、エンチャンティング攻撃（ enchanting attack）でエージェントをターゲット状態へ誘導するよう摂動を計画する攻撃を、それぞれ評価。A3CとDQNを横断し、五つのAtariゲームで評価。

ABSTRACT

We introduce two tactics to attack agents trained by deep reinforcement learning algorithms using adversarial examples, namely the strategically-timed attack and the enchanting attack. In the strategically-timed attack, the adversary aims at minimizing the agent's reward by only attacking the agent at a small subset of time steps in an episode. Limiting the attack activity to this subset helps prevent detection of the attack by the agent. We propose a novel method to determine when an adversarial example should be crafted and applied. In the enchanting attack, the adversary aims at luring the agent to a designated target state. This is achieved by combining a generative model and a planning algorithm: while the generative model predicts the future states, the planning algorithm generates a preferred sequence of actions for luring the agent. A sequence of adversarial examples is then crafted to lure the agent to take the preferred sequence of actions. We apply the two tactics to the agents trained by the state-of-the-art deep reinforcement learning algorithm including DQN and A3C. In 5 Atari games, our strategically timed attack reduces as much reward as the uniform attack (i.e., attacking at every time step) does by attacking the agent 4 times less often. Our enchanting attack lures the agent toward designated target states with a more than 70% success rate. Videos are available at http://yenchenlin.me/adversarial_attack_RL/

研究の動機と目的

Understand vulnerability of deep RL agents to adversarial perturbations.
Develop tactics that minimize perturbations while reducing agent rewards.
Demonstrate effectiveness of attacks on state-of-the-art Deep RL algorithms (A3C, DQN).
Explore planning-based attacks to steer agents toward designated states.

提案手法

戦略的タイムド攻撃を、相対的アクション好み関数を用いて摂動を行う時点を決定することで定義する。
Carlini & Wagner 法を用いて、エージェントの最も好まれる行動を最も好ましくないものへ flip するような摂動を作成する。
総摂動をΓという予算で制限し、報酬への影響を均一な攻撃と比較して評価する。
ビデオ予測モデルと計画アルゴリズムを組み合わせて、Hステップにわたりエージェントをターゲット状態へ誘導するエンチャンティング攻撃を導入する。
未来状態予測子Mを用いて s_{t+H}^M = M(s_t, A_{t:t+H}) を推定し、サンプリングベースのクロスエントロピー法で行動系列 A_{t:t+H} を計画する。
Atariゲーム（MsPacman, Pong, Seaquest, Qbert, ChopperCommand）を対象に、A3CとDQNで評価する。

実験結果

リサーチクエスチョン

RQ1DQNとA3Cで訓練された深層RLエージェントは、簡便な検出を誘発せずに最小限の摂動で効果的に攻撃され得るか。
RQ2戦略的タイムド攻撃は、均一な攻撃と比較して累積報酬の低減にどの程度効果的か。
RQ3計画ベースのエンチャンティング攻撃は、設計されたターゲット状態へエージェントを安定して誘導できるか、どの条件下でそうなるか。
RQ4これら二つの敵対的戦術に対するロバスト性の防御上の考慮事項は何か。

主な発見

戦略的タイムド攻撃は、観測を摂動するステップを平均して約25%程度に抑えつつ、均一な攻撃と同等の報酬低下を達成しうる。
DQNエージェントは、ほとんどのゲームでA3Cより戦略的タイムド攻撃に対して脆弱である傾向がある。
エンチャンティング攻撃は、いくつかの設定とゲームでターゲット状態へ誘導する成功率が70%以上を超える。
予測モデルの不確実性が高い環境（複数のランダムな敵がいる場合など）では、エンチャンティング攻撃の効果が低下する。
本研究は最先端のDeep RLエージェントに対する二つの新規攻撃ベクトルを示し、潜在的な防御策について議論する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。