QUICK REVIEW

[論文レビュー] Detecting Adversarial Attacks on Neural Network Policies with Visual Foresight

Yen-Chen Lin, Ming-Yu Liu|arXiv (Cornell University)|Oct 2, 2017

Adversarial Robustness in Machine Learning参考文献 41被引用数 34

ひとこと要約

本論文では、行動条件付きフレーム予測モデルを用いて観測フレームと予測フレームの行動分布を比較することで、深層強化学習方策に対する敵対的攻撃を検出する防御機構を提案する。この手法は、敵対的入力を検出した際に予測結果に切り替えることで、攻撃下でも性能を維持し、アタリ2600環境においてベースラインを上回る性能を達成する。

ABSTRACT

Deep reinforcement learning has shown promising results in learning control policies for complex sequential decision-making tasks. However, these neural network-based policies are known to be vulnerable to adversarial examples. This vulnerability poses a potentially serious threat to safety-critical systems such as autonomous vehicles. In this paper, we propose a defense mechanism to defend reinforcement learning agents from adversarial attacks by leveraging an action-conditioned frame prediction module. Our core idea is that the adversarial examples targeting at a neural network-based policy are not effective for the frame prediction model. By comparing the action distribution produced by a policy from processing the current observed frame to the action distribution produced by the same policy from processing the predicted frame from the action-conditioned frame prediction module, we can detect the presence of adversarial examples. Beyond detecting the presence of adversarial examples, our method allows the agent to continue performing the task using the predicted frame when the agent is under attack. We evaluate the performance of our algorithm using five games in Atari 2600. Our results demonstrate that the proposed defense mechanism achieves favorable performance against baseline algorithms in detecting adversarial examples and in earning rewards when the agents are under attack.

研究の動機と目的

自律走行などの安全が重要な応用分野におけるDNNベースの強化学習方策が敵対的例に対して脆弱であるという問題に対処すること。
時間的整合性と行動条件付きフレーム予測を用いて、逐次意思決定タスクにおける敵対的入力を検出する防御機構を開発すること。
観測された入力が汚染された場合でも、予測されたフレームに基づく行動提案により、エージェントがタスクを継続して実行できるようにすること。
訓練時に敵対的例を必要とせず、多様なDNNベースの方策に効果的に適用可能なモデルに依存しない防御を構築すること。

提案手法

過去のフレームと行動から現在のフレームを予測する行動条件付きフレーム予測モデル（視覚的予測モジュール）を学習する。
予測されたフレームを同じ方策に入力し、観測フレームからの行動分布と比較する。
観測フレームと予測フレームからの行動分布が著しく乖離している場合に敵対的攻撃を検出する。
敵対的検出が発動した際には、観測フレームから予測フレームに切り替えることで、エージェントが継続して行動できるようにする。
複数のフレームと行動における時間的整合性を活用することで、単一フレームの敵対的摂動に対する検出の耐性を向上させる。
モデルの正確性の代理指標として、フレーム予測の平均二乗誤差（MSE）を用い、検出性能と相関関係があることを示す。

実験結果

リサーチクエスチョン

RQ1時間的整合性と行動条件付きフレーム予測は、DRL方策における敵対的例の検出に有効か？
RQ2フレーム予測モデルの精度は、敵対的例の検出性能にどのように影響するか？
RQ3予測フレームに依存することで、エージェントは継続的な敵対的攻撃下でもタスク性能を維持できるか？
RQ4順次意思決定設定において、従来の画像分類ベースの敵対的検出手法と比較して、本手法はどのように差をつけるか？
RQ5検出メカニズムを把握している適応的敵対者に対しても、防御は有効か？

主な発見

提案された防御は、画像分類分野の強力なベースライン検出器と比較して、敵対的例の検出においてより高い平均平均適合率（mAP）を達成した。
フレーム予測の精度と検出性能が強く相関しており、フレーム予測モデルのMSEが低くなるほどmAPが向上した。
アタリ2600環境では、時間の大部分が攻撃下にあっても、予測フレームに切り替えることで高い報酬性能を維持できた。
過去のフレームが汚染されている可能性がある状況でも、フレーム予測モデルが攻撃を標的としない限り、その耐性により本手法は有効である。
訓練時に敵対的例を必要とせず、さまざまなDNNベースの方策に広く適用可能なモデルに依存しない防御である。
時間的情報の使用が直交的であるため、敵対的訓練や防御蒸留などの既存防御と統合可能であり、相乗効果を発揮する可能性がある。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。