QUICK REVIEW

[논문 리뷰] Understanding and Unifying Fourteen Attribution Methods with Taylor Interactions

Huiqi Deng, Na Zou|arXiv (Cornell University)|2023. 03. 02.

Multi-Criteria Decision Making인용 수 18

한 줄 요약

본 논문은 Taylor 상호작용 프레임워크 아래 fourteen input attribution methods를 통합하고, 각 어트리뷰션이 독립 효과와 상호작용 효과의 가중치 배분임을 보여주며, 충실성 원칙을 제안한다.

ABSTRACT

Various attribution methods have been developed to explain deep neural networks (DNNs) by inferring the attribution/importance/contribution score of each input variable to the final output. However, existing attribution methods are often built upon different heuristics. There remains a lack of a unified theoretical understanding of why these methods are effective and how they are related. To this end, for the first time, we formulate core mechanisms of fourteen attribution methods, which were designed on different heuristics, into the same mathematical system, i.e., the system of Taylor interactions. Specifically, we prove that attribution scores estimated by fourteen attribution methods can all be reformulated as the weighted sum of two types of effects, i.e., independent effects of each individual input variable and interaction effects between input variables. The essential difference among the fourteen attribution methods mainly lies in the weights of allocating different effects. Based on the above findings, we propose three principles for a fair allocation of effects to evaluate the faithfulness of the fourteen attribution methods.

연구 동기 및 목표

DNN의 어트리뷰션 방법들에 대한 통합된 이론적 이해의 필요성을 고취한다.
네트워크 출력을 독립 효과와 상호작용 효과로 분해하는 Taylor 상호작용 기반 시스템을 형식화한다.
기존의 fourteen 가지 어트리뷰션 방법이 이 통합 프레임워크 안에서 재구성될 수 있음을 보인다.
방법들이 입력 변수에 효과를 얼마나 충실히 할당하는지 평가하기 위한 공정성 원칙을 제안한다.

제안 방법

Baseline를 중심으로 DNN 출력을 Taylor 전개로 형식화한다: f(x) = f(b) + sum of independent effects phi(kappa) and interaction effects I(kappa).
적절한 Taylor 항들의 합으로 일반적인 독립 효과 psi(i)와 일반적인 상호작용 효과 J(S)를 정의한다.
Harsanyi 배당 H(S)가 일반적인 상호작용 효과 J(S)와 같음을 증명한다.
모든 어트리뷰션 a_i가 독립 효과의 가중합과 상호작용 효과의 가중합으로 쓰일 수 있음을 보여준다: a_i = sum_j w_{i,j} psi(j) + sum_S w_{i,S} J(S).
네 가지 fourteen attribution 방법을 이 프레임워크에 대한 통일된 매핑과 각 방법의 할당 가중치에 대한 명시적 표현을 제공한다.

실험 결과

연구 질문

RQ1네 fourteen attribution methods가 하나의 Taylor 상호작용 프레임워크 아래 이론적으로 통합될 수 있는가?
RQ2독립 효과와 상호작용 효과가 입력 어트리뷰션에 어떻게 기여하며, 다양한 방법이 이러한 효과를 어떻게 할당하는가?
RQ3독립 효과와 상호작용 효과를 입력 변수에 충실하게 할당하도록 보장하는 원칙은 무엇인가?
RQ4기존의 어트리뷰션 방법들이 Taylor 독립 효과와 상호작용 효과의 재배치로 어떻게 표현될 수 있는가?
RQ5Taylor 프레임워크와 Harsanyi 배당처럼 게임 이론적 척도 간의 관계는 무엇인가?

주요 결과

모든 fourteen 어트리뷰션 방법은 Taylor 독립 효과와 Taylor 상호작용 효과의 배분으로 재구성될 수 있다.
일반적인 상호작용 효과 J(S)는 Harsanyi 배당 H(S)와 동등하다.
입력 변수에 대한 독립 효과와 상호작용 효과의 공정한 배분을 평가하기 위한 세 가지 충실성 원칙이 제안된다.
여러 고전적 방법(예: Shapley value, Integrated Gradients, DeepLIFT Rescale)이 충실성 원칙을 충족한다.
포괄적인 매핑 표는 각 방법이 독립 및 상호작용 구성 요소에 대한 할당 가중치를 명시함으로써 어떻게 통합 패러다임에 들어맞는지 보여준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.