QUICK REVIEW

[논문 리뷰] An Evaluation of the Human-Interpretability of Explanation

Isaac Lage, Emily Chen|arXiv (Cornell University)|2019. 01. 31.

Explainable Artificial Intelligence (XAI)참고 문헌 59인용 수 121

한 줄 요약

본 연구는 의사결정 집합에서의 다양한 설명 복잡성 유형이 작업 및 도메인 전반에서 인간 해석가능성에 어떻게 영향을 미치는지 실증적으로 연구하고, 인지 청크를 사용성의 핵심 요인으로 확인한다.

ABSTRACT

Recent years have seen a boom in interest in machine learning systems that can provide a human-understandable rationale for their predictions or decisions. However, exactly what kinds of explanation are truly human-interpretable remains poorly understood. This work advances our understanding of what makes explanations interpretable under three specific tasks that users may perform with machine learning systems: simulation of the response, verification of a suggested response, and determining whether the correctness of a suggested response changes under a change to the inputs. Through carefully controlled human-subject experiments, we identify regularizers that can be used to optimize for the interpretability of machine learning systems. Our results show that the type of complexity matters: cognitive chunks (newly defined concepts) affect performance more than variable repetitions, and these trends are consistent across tasks and domains. This suggests that there may exist some common design principles for explanation systems.

연구 동기 및 목표

일반적인 ML 작업에서 설명이 인간이 해석하기 가능하게 만드는 요소를 조사한다.
설명 속성(크기, 인지 청크, 반복)이 활용성에 미치는 영향을 정량화한다.
두 도메인 (recipe recommendations and clinical decisions)과 세 가지 작업 (simulation, verification, counterfactual)에 걸쳐 해석가능성을 비교한다.
의사결정 집합 설명의 해석가능성을 향상시키는 정규화 기법을 식별한다.

제안 방법

기계 학습 출력과 유사하게 제어되고 수작업으로 구성된 의사결정 집합 설명을 구성한다.
세 가지 설명 변이 차원(크기, 인지 청크, 반복 용어)을 조작한다.
두 도메인(recipe and clinical) 및 세 가지 작업(simulation, verification, counterfactual)을 대상으로 평가한다.
정확도, 반응 시간, 주관적 만족도라는 세 가지 지표로 성능을 측정한다.
MTurk에서 실험당 150명의 참가자를 모집하고 연습 문제를 바탕으로 포함 기준을 적용한다.

Figure 1 : Example of a decision set explanation.

실험 결과

연구 질문

RQ1의사결정 집합 설명의 어떤 속성이 작업과 도메인 전반에서 인간 사용성에 가장 큰 영향을 미치는가?
RQ2인지 청크, 줄/용어 길이, 또는 반복이 반응 시간, 정확도, 만족도에 다르게 영향을 미치는가?
RQ3해석가능한 설명을 위한 도메인 및 작업 일반의 설계 원칙이 있는가?

주요 결과

설명 복잡성이 커질수록 일반적으로 다양한 작업과 도메인에서 반응 시간이 증가합니다.
인지 청크(새로운 개념)가 용어의 단순 반복보다 성능에 더 큰 영향을 준다.
명시적으로 정의된 인지 청크는 암시적으로 포함된 청크보다 반응 시간을 더 증가시키는 경향이 있어 스캔/처리 비용을 시사한다.
설명 크기(라인 수와 출력 용어)의 반응 시간에 대한 효과는 도메인에 따라 다르며, 레시피에서는 이러한 효과가 더 두드러진다.
반복 용어는 새로운 인지 청크를 도입하는 것보다 반응 시간에 덜 일관되고 작은 영향을 보였다.

Figure 2 : Screenshot of our interface for the verification task in the recipe domain. The bottom left box shows the observations we give participants about the alien, and a meal recommendation. They must then say whether the machine learning system agrees with the recommendation based on the explan

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.