QUICK REVIEW

[논문 리뷰] Explaining by Removing: A Unified Framework for Model Explanation

Ian Covert, Scott Lundberg|arXiv (Cornell University)|2020. 11. 21.

Explainable Artificial Intelligence (XAI)참고 문헌 95인용 수 123

한 줄 요약

이 논문은 제거 기반 설명을 모델 해석의 통합 프레임워크로 소개하며, 세 가지 설계 선택으로 26가지 방법을 통합하고 이를 심리학, 게임 이론, 정보 이론과 연결한다.

ABSTRACT

Researchers have proposed a wide variety of model explanation approaches, but it remains unclear how most methods are related or when one method is preferable to another. We describe a new unified class of methods, removal-based explanations, that are based on the principle of simulating feature removal to quantify each feature's influence. These methods vary in several respects, so we develop a framework that characterizes each method along three dimensions: 1) how the method removes features, 2) what model behavior the method explains, and 3) how the method summarizes each feature's influence. Our framework unifies 26 existing methods, including several of the most widely used approaches: SHAP, LIME, Meaningful Perturbations, and permutation tests. This newly understood class of explanation methods has rich connections that we examine using tools that have been largely overlooked by the explainability literature. To anchor removal-based explanations in cognitive psychology, we show that feature removal is a simple application of subtractive counterfactual reasoning. Ideas from cooperative game theory shed light on the relationships and trade-offs among different methods, and we derive conditions under which all removal-based explanations have information-theoretic interpretations. Through this analysis, we develop a unified framework that helps practitioners better understand model explanation tools, and that offers a strong theoretical foundation upon which future explainability research can build.

연구 동기 및 목표

다양한 모델 설명 방법을 연관시키고 비교할 필요성을 동기화한다.
ML 모델 해석을 위한 일반 프레임워크로 제거 기반 설명을 도입한다.
세 가지 독립적 설계 선택(특징 제거, 모델 동작, 요약)을 통해 방법을 특징화한다.
심리학, 게임 이론, 정보 이론의 통찰을 사용하여 기존 방법 간의 연결을 통합하고 분석한다.

제안 방법

특징 그룹을 모델에서 제거하는 영향력을 정량화하는 함수로 제거 기반 설명을 정의한다.
특징을 제거하는 방식, 설명되는 모델 동작, 영향의 요약 방식의 세 가지 선택으로 방법을 특징화한다.
기존의 26가지 방법을 조사하고 이것들이 3차원 프레임워크에 어떻게 맞아떨어지는지 보인다.
한정적 또는 주변화(조건부 또는 주변화)로의 주변화가 설명에 대한 정보이론적 해석을 산출한다.
제거 기반 설명을 협력적 게임 이론과 연결하고 Shapley 기반의 기여도 표기를 통합 주제로 논의한다.
framework 내 기존 방법들을 결합하여 새로운 접근법을 창출하는 실증적 탐구를 제공한다.

실험 결과

연구 질문

RQ1다양한 모델 설명 방법을 단일 제거 기반 프레임워크로 어떻게 통합할 수 있는가?
RQ2제거 기반 설명을 차별화하는 근본적인 설계 선택은 무엇인가?
RQ3제거 기반 설명이 정보이론적 해석을 허용하는 경우는 언제인가?
RQ4인지 심리학과 협력적 게임 이론의 통찰을 통해 기존 방법은 어떻게 연결되는가?
RQ5프레임워크 내 선택을 혼합하여 어떤 새로운 방법이 나타나는가?

주요 결과

프레임워크는 SHAP, LIME, Meaningful Perturbations, 그리고 순열 검정들을 포함한 26개의 제거 기반 설명 방법을 통합한다.
주변화 기반 제거(조건부 또는 주변화)는 제거 기반 설명에 대한 정보이론적 기초를 제공한다.
협력적 게임 이론과의 깊은 연결이 있으며, Shapley 값이 종종 특징 영향의 원칙적 요약을 제공한다.
설명들을 인지 심리학의 덧셈적 대사실 추론, Mill의 차이법, 관련 아이디어와 연결하는 방식이 있다.
프레임워크 선택을 결합한 실험은 60개가 넘는 새로운 설명 방법을 만들어내고 방법 간의 관계를 드러낸다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.