QUICK REVIEW

[논문 리뷰] On the (In)fidelity and Sensitivity for Explanations

Chih‐Kuan Yeh, Cheng-Yu Hsieh|arXiv (Cornell University)|2019. 01. 27.

Explainable Artificial Intelligence (XAI)참고 문헌 48인용 수 64

한 줄 요약

논문은 흑상자 모델의 주의 설명에 대한 배신도(infidelity) 목적을 형식화하고, 서로 다른 섭동하에서 최적의 설명을 도출하며, 스무딩 기반 접근이 민감도와 배신도를 모두 감소시킬 수 있음을 실험으로 검증한다.

ABSTRACT

We consider objective evaluation measures of saliency explanations for complex black-box machine learning models. We propose simple robust variants of two notions that have been considered in recent literature: (in)fidelity, and sensitivity. We analyze optimal explanations with respect to both these measures, and while the optimal explanation for sensitivity is a vacuous constant explanation, the optimal explanation for infidelity is a novel combination of two popular explanation methods. By varying the perturbation distribution that defines infidelity, we obtain novel explanations by optimizing infidelity, which we show to out-perform existing explanations in both quantitative and qualitative measurements. Another salient question given these measures is how to modify any given explanation to have better values with respect to these measures. We propose a simple modification based on lowering sensitivity, and moreover show that when done appropriately, we could simultaneously improve both sensitivity as well as fidelity.

연구 동기 및 목표

블랙박스 모델에 대한 주의 설명의 객관적 평가를 동기화한다.
설명이 Significant input perturbations 하에서 예측자의 변화를 얼마나 포착하는지 정량화하는 견고한 infidelity 측정을 정의하고 분석한다.
infidelity와 기존 설명 간의 관계를 밝히고 새로운 perturbation 기반 설명을 도출한다.
민감도와 infidelity를 모두 감소시키는 스무딩 기반 수정들을 제안하고, 실무적 검증을 제공한다.

제안 방법

설명을 perturbation-weighted 설명과 perturbations I 하에서의 실제 함수 변화 간의 예상 제곱 차이로 정의하여 설명의 infidelity를 정의한다.
perturbation 분포 μI 및 Integrated Gradients (IG)와의 적분을 이용해 infidelity를 최소화하는 최적의 설명 Φ*를 특징화한다.
많은 기존 설명들(IG, DeepLIFT, LRP)이 특정 perturbations 하에서 infidelity에 대해 최적의 설명으로 나타나며, 다른 perturbations(예: noisy baseline, square removal)에 대해서는 새로운 설명을 도출한다.
커널 스무딩(Φk)을 제시하여 더 매끄러운 설명을 얻고, 이를 Smooth-Grad와 연관지으며, 스무딩 후 infidelity가 개선되는 조건을 제시한다.
강인성 몬테카를로 친화적 최대 민감도 측정치를 도입하고, 이를 스무딩을 통해 충실도와 연관시키며, 필요시 adversarial training을 보강으로 제시한다.

실험 결과

연구 질문

RQ1설명의 충실성을 흑상자 예측기와 얼마나 일치시키는지 측정할 수 있는 객관적 기법은 무엇인가?
RQ2다양한 perturbation 설계가 infidelity 목적하에서 최적 설명에 어떤 영향을 미치는가? 이 perturbation들로부터 새로운 설명을 설계할 수 있는가?
RQ3단순한 스무딩이나 학습 전략으로도 민감도와 infidelity를 모두 감소시키면서 충실도를 해치지 않는가?
RQ4기존 설명은 infidelity 프레임에서 얼마나 성능을 발휘하는가, 그리고 스무딩의 개선이 인간 평가와 상관관계가 있는가?

주요 결과

최적의 infidelity 최소화 설명은 perturbation 유도로 얻은 커널을 이용한 스무딩된 Integrated Gradients 스타일의 조합으로 표현될 수 있다.
많은 기존 설명들(IG, DeepLIFT, LRP)이 특정 perturbations 하에서 infidelity 최적 설명의 특수한 경우로 나타나고, 새로운 perturbations는 새로운 설명을 낳는다.
스무딩 기반의 조정(예: Smooth-Grad)은 대부분의 경우 민감도와 infidelity를 함께 감소시키고, 질적 시각화에 개선을 가져올 수 있다.
느슨한 강인 perturbation(예: noisy baseline, square removal)은 infidelity를 낮추고 더 충실한 시각화를 가져오며, 인간 평가로도 확인된다.
adversarial training은 민감도와 infidelity를 모두 낮추는 데 도움이 될 수 있으며, 더 충실한 설명을 위한 모델 수준 전략을 시사한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.