QUICK REVIEW

[論文レビュー] Smoothed Geometry for Robust Attribution

Zifan Wang, Haofan Wang|arXiv (Cornell University)|Jun 1, 2020

Adversarial Robustness in Machine Learning被引用数 12

ひとこと要約

本論文では、勾配に基づく特徴帰属の頑健性を向上させるために、モデル勾配にリプシッツ連続性を強制することで、類似する入力に対して一貫性のない説明を生じる adversarial パーティクルの脆弱性を顕著に低減する正則化および確率的スムージング手法を提案する。実験により、多様な画像モデルにおいて帰属の頑健性が一貫して向上することが示された。

ABSTRACT

Feature attributions are a popular tool for explaining the behavior of Deep Neural Networks (DNNs), but have recently been shown to be vulnerable to attacks that produce divergent explanations for nearby inputs. This lack of robustness is especially problematic in high-stakes applications where adversarially-manipulated explanations could impair safety and trustworthiness. Building on a geometric understanding of these attacks presented in recent work, we identify Lipschitz continuity conditions on models' gradient that lead to robust gradient-based attributions, and observe that smoothness may also be related to the ability of an attack to transfer across multiple attribution methods. To mitigate these attacks in practice, we propose an inexpensive regularization method that promotes these conditions in DNNs, as well as a stochastic smoothing technique that does not require re-training. Our experiments on a range of image models demonstrate that both of these mitigations consistently improve attribution robustness, and confirm the role that smooth geometry plays in these attacks on real, large-scale models.

研究の動機と目的

類似する入力に対して一貫性のない説明を生じる adversarial パーティクルに対して、勾配に基づく特徴帰属の脆弱性を軽減すること。
帰属手法の頑健性を保証するモデル勾配におけるリプシッツ連続性条件を同定すること。
再トレーニングを必要とせず、アーキテクチャの変更なしに滑らかなモデル幾何を強制する、実用的で低コストの正則化およびスムージング技術を開発すること。
異なる手法間での帰属攻撃の転送性とモデルの滑らかさの関係を調査すること。

提案手法

深層ニューラルネットワークの勾配におけるリプシッツ連続性を促進する重み減衰に基づく正則化を提案する。
推論時に入力ノイズを適用することで、モデル出力および勾配挙動をスムージングする確率的スムージング技術を導入する。
adversarial 攻撃の幾何的分析を用いて、モデルの滑らかさと帰属の頑健性の関連を結びつける。
アーキテクチャの変更や再トレーニングなしに、標準的な画像分類モデルに提案手法を適用する。
微小な入力変化における摂動に基づく帰属安定性を評価することで、頑健性を測定する。

実験結果

リサーチクエスチョン

RQ1深層ニューラルネットワークのどのような幾何的性質が、頑健な勾配ベースの帰属をもたらすか？
RQ2モデル勾配におけるリプシッツ連続性は、adversarial パーティクルの下で特徴帰属の安定性にどのように影響するか？
RQ3滑らかなモデル幾何は、異なる帰属手法間での adversarial 攻撃の転送性を低減できるか？
RQ4正則化および確率的スムージングは、実際の応用において、どの程度帰属の頑健性を向上させられるか？

主な発見

提案された正則化手法は、複数の画像モデルにおいて帰属の頑健性を顕著に向上させ、微小な入力摂動下での説明の乖離を低減した。
確率的スムージングは、再トレーニングやアーキテクチャの変更なしに、強力な頑健性の向上を実現した。
より滑らかな幾何を持つモデルは、異なる帰属手法間での adversarial 攻撃の転送性が低下した。
幾何的分析により、勾配におけるリプシッツ連続性が頑健な帰属の主要因であることが確認された。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。