QUICK REVIEW

[논문 리뷰] The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective

Satyapriya Krishna, Tessa Han|arXiv (Cornell University)|2022. 02. 03.

Explainable Artificial Intelligence (XAI)인용 수 42

한 줄 요약

이 논문은 사후 설명들 간의 불일치를 형식화하고 그 발생 빈도를 데이터 세트와 모델 across across across? 실제로: 원문은 영어로 주어졌지만 요구대로 자연어 텍스트만 번역합니다. 다만 여기서는 원문의 영어를 번역하지 않고 의도대로 한국어로 표현하되 숫자나 고유명사는 그대로 두겠습니다. 이 논문은 사후 설명 간의 불일치를 형식화하고 그 발생 빈도를 데이터 세트와 모델 across across across? 실무자들이 이러한 불일치를 해결하기 위한 원칙 있는 방법이 부족하다는 것을 보여준다.

ABSTRACT

As various post hoc explanation methods are increasingly being leveraged to explain complex models in high-stakes settings, it becomes critical to develop a deeper understanding of whether and when the explanations output by these methods disagree with each other, and how such disagreements are resolved in practice. However, there is little to no research that provides answers to these critical questions. In this work, we formalize and study the disagreement problem in explainable machine learning. More specifically, we define the notion of disagreement between explanations, analyze how often such disagreements occur in practice, and how practitioners resolve these disagreements. We first conduct interviews with data scientists to understand what constitutes disagreement between explanations generated by different methods for the same model prediction, and introduce a novel quantitative framework to formalize this understanding. We then leverage this framework to carry out a rigorous empirical analysis with four real-world datasets, six state-of-the-art post hoc explanation methods, and six different predictive models, to measure the extent of disagreement between the explanations generated by various popular explanation methods. In addition, we carry out an online user study with data scientists to understand how they resolve the aforementioned disagreements. Our results indicate that (1) state-of-the-art explanation methods often disagree in terms of the explanations they output, and (2) machine learning practitioners often employ ad hoc heuristics when resolving such disagreements. These findings suggest that practitioners may be relying on misleading explanations when making consequential decisions. They also underscore the importance of developing principled frameworks for effectively evaluating and comparing explanations output by various explanation techniques.

연구 동기 및 목표

다른 방법들로부터의 로컬 설명 간의 불일치를 실무자들이 보는 관점에서 무엇으로 간주될지 정의한다.
동일한 예측에 대한 두 설명 간의 불일치를 측정하는 정량적 프레임워크를 개발한다.
실제 데이터 세트, 모델 및 설명 방법 전반에서 불일치를 실증적으로 정량화한다.
사용자 연구를 통해 실무자들이 실무에서 불일치를 어떻게 해결하는지 조사한다.
평가 지표 및 실무자 교육에 대한 시사점을 도출한다.

제안 방법

설명 불일치를 특징짓기 위해 상위 k 특성의 중복, 순서 및 부호/방향 정렬에 초점을 맞춘 여섯 가지 지표를 형식화한다.
테이블형, 텍스트, 이미지 모달리티의 네 가지 실제 데이터 세트에서 여섯 가지 후처리 설명 방법(LIME, KernelSHAP, Vanilla Gradient, Gradient*Input, Integrated Gradients, SmoothGrad)을 학습하고 평가한다.
테이블형에 대해 로지스틱 회귀, 피드포워드 신경망, 랜덤 포레스트, 그래디언트 부스팅 트리의 네 가지 모델 군을 사용하고, 텍스트에는 LSTM, 이미지에는 ResNet-18를 사용한다.
네 가지 모델 가족에 대해 여섯 가지 불일치 지표를 적용해 설명을 비교하고, k 및 모델 복잡도에 따라 불일치가 어떻게 달라지는지 연구한다.
(추가 문장 정리 필요 시 보완)

실험 결과

연구 질문

RQ1동일한 예측에 대해 최첨단 사후 설명 방법들이 설명 간 불일치를 얼마나 자주 보이는가?
RQ2실무자들이 불일치로 간주하는 측면은 어떤 것들인가(상위-k 특징, 순서, 부호, 특성 중요도 간의 상대성)?
RQ3설명들 간의 불일치를 일반 프레임워크에서 형식화하고 정량화할 수 있는가?
RQ4실무자들은 실무에서 불일치를 어떻게 해결하며 어떤 전략을 보고하는가?

주요 결과

면접에 참여한 데이터 과학자 중 84%가 워크플로우에서 설명 간 불일치를 경험했다고 보고했다.
온라인 연구 참여자의 86%가 임의의 휴리스틱에 의존하거나 불일치를 해결하는 방법을 알지 못했다.
Grad-SmoothGrad 및 Grad*Input-IntGrad는 합의하는 경향이 있는 반면, Grad-IntGrad, Grad-Grad*Input, SmoothGrad-Grad*Input, SmoothGrad-IntGrad는 불일치하는 경향이 있어 그래디언트 기반 방법 간의 이분화를 나타낸다.
불일치는 모델 클래스와 데이터 모달리티에 걸쳐 지속되는 경향이 있으며, 특징 수가 많은 데이터 세트(예: German Credit)와 더 복잡한 모델에서 더 강한 불일치를 보인다.
top-k가 커질수록 불일치가 증가하여 랭크 합의 및 부호 합의가 감소하며, 특징 순서와 부호에 대한 민감성을 강조한다.
사용자는 특징 중요도 값(LIME 대 SHAP)이 직접적으로 비교될 수는 없지만, 상위 특징과 그 순서에 대한 일관된 통찰이 기대된다고 강조한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.