QUICK REVIEW

[論文レビュー] TABOR: A Highly Accurate Approach to Inspecting and Restoring Trojan Backdoors in AI Systems

Wenbo Guo, Lun Wang|arXiv (Cornell University)|Aug 2, 2019

Adversarial Robustness in Machine Learning参考文献 47被引用数 134

ひとこと要約

TABORはトロジャンバックドア検査を非凸最適化へ再定式化し、新しい正則化と忠実度に焦点を当てたトリガー復元指標を提供することで、変動するトリガー条件下でNeural Cleanseより検出と復元を改善する。

ABSTRACT

A trojan backdoor is a hidden pattern typically implanted in a deep neural network. It could be activated and thus forces that infected model behaving abnormally only when an input data sample with a particular trigger present is fed to that model. As such, given a deep neural network model and clean input samples, it is very challenging to inspect and determine the existence of a trojan backdoor. Recently, researchers design and develop several pioneering solutions to address this acute problem. They demonstrate the proposed techniques have a great potential in trojan detection. However, we show that none of these existing techniques completely address the problem. On the one hand, they mostly work under an unrealistic assumption (e.g. assuming availability of the contaminated training database). On the other hand, the proposed techniques cannot accurately detect the existence of trojan backdoors, nor restore high-fidelity trojan backdoor images, especially when the triggers pertaining to the trojan vary in size, shape and position. In this work, we propose TABOR, a new trojan detection technique. Conceptually, it formalizes a trojan detection task as a non-convex optimization problem, and the detection of a trojan backdoor as the task of resolving the optimization through an objective function. Different from the existing technique also modeling trojan detection as an optimization problem, TABOR designs a new objective function--under the guidance of explainable AI techniques as well as heuristics--that could guide optimization to identify a trojan backdoor in a more effective fashion. In addition, TABOR defines a new metric to measure the quality of a trojan backdoor identified. Using an anomaly detection method, we show the new metric could better facilitate TABOR to identify intentionally injected triggers in an infected model and filter out false alarms......

研究の動機と目的

訓練データやモデル内部へのアクセスなしで、堅牢なトロジャンバックドア検出を動機づける。
最適化ベースの検出フレームワークを、偽警報を減らすための正則化とともに開発する。
正確にトロジャントリガーを回復するためのトリガー復元指標と手法を提案する。
多様なモデルとバックドア設定にわたって、TABORを既存の方法と比較して評価する。
トロジャン挿入技術とモデルの複雑さの変化に対するTABORの頑健性を示す。

提案手法

トロジャン検出をマスク M とトリガー Delta に対する非凸最適化問題として定式化する。
過度に大きい・散在するトリガーを罰するための4つの正則化項を導入し、遮断、オーバーレイ、無関係な特徴を抑制する。
悪用的サブスペースを減らし、簡潔で連続的なトリガーを促すようR1とR2正則化を設計する。
トリガー除去後の正しい分類を維持しつつ、重要な画像特徴の遮断を避けるためのR3正則化を追加する。
explainable AI に触発された R4 正則化を取り入れ、特徴量重要度の洞察を通じてトリガー忠実度を精練する。
偽警報とトリガーオーバーレイの観察に guided された特別なアプローチで最適化を解く。

実験結果

リサーチクエスチョン

RQ1訓練データやモデル内部へのアクセスなしに、TABORはトロジャンバックドアの存在を信頼性高く検出できるか？
RQ2さまざまなトリガー形状、サイズ、位置にわたって、TABORは高忠実度のトロジャントリガーを正確に復元できるか？
RQ3TABORは Neural Cleanse に対して多様なトロジャン構成とモデルの複雑さのもとでどのように性能を発揮するか？
RQ4正則化ガイド付き目的が、感染モデルとクリーンモデルの両方で偽警報を減らしトリガー忠実度を改善できるか？

主な発見

TABORは最先端の Neural Cleanse と比較して、検出性能とトリガー復元忠実度の改善を示す。
正則化項は敵対的サブスペースを縮小し、散在または過大なトリガーからの偽警報を抑制する。
遮断トリガー正則化は、重要な画像コンテンツを遮るトリガーを排除する。
オーバーレイトリガー正則化は、意図されたトロジャントリガーのより高忠実度の表現を抽出するのに役立つ。
explainable-AI に触発された正則化は、無関係な特徴を剪定して復元されたトリガーを洗練する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。