QUICK REVIEW

[논문 리뷰] Large Language Models Cannot Explain Themselves

Advait Sarkar|arXiv (Cornell University)|2024. 05. 07.

Topic Modeling인용 수 6

한 줄 요약

이 논문은 언어 모델이 출력에 대한 기계적(mechanismal) 설명을 제공할 수 없다고 주장하고, true explanations와 구분하기 위해 exoplanations 용어를 소개하며, 비판적 사고를 촉진하기 위한 디자인 가드레일과 공동 감사(co-audit) 전략을 제안한다.

ABSTRACT

Large language models can be prompted to produce text. They can also be prompted to produce "explanations" of their output. But these are not really explanations, because they do not accurately reflect the mechanical process underlying the prediction. The illusion that they reflect the reasoning process can result in significant harms. These "explanations" can be valuable, but for promoting critical thinking rather than for understanding the model. I propose a recontextualisation of these "explanations", using the term "exoplanations" to draw attention to their exogenous nature. I discuss some implications for design and technology, such as the inclusion of appropriate guardrails and responses when models are prompted to generate explanations.

연구 동기 및 목표

LLM 출력에 대한 기계적 설명과 exoplanations 간의 구분의 동기를 제시한다.
exoplanations로 인한 사회적 해악과 AI 설명 가능성 재맥락화의 필요성을 강조한다.
가드레일과 공동 감사 도구를 포함한 설계 시사점을 제안하여 의사결정 지원과 비판적 사고를 개선한다.

제안 방법

기계적 설명과 exoplanations를 구분하여 정의하고, E형 출력이 근본 메커니즘을 반영할 수 없는 이유를 설명한다.
exoplanations가 O와 동일한 예측 프로세스에 의해 생성되며 모델 내부의 기초에 대한 근거가 없다고 주장한다.
exoplanations의 사회적 및 안전상의 해악과 오정보에 의한 의사결정의 위험을 논의한다.
면책 고지, 가드레일 및 공동 감사 접근법과 같은 실용적 설계 개입을 제안하여 위험을 완화한다.

실험 결과

연구 질문

RQ1언어 모델 맥락에서 기계적 설명과 exoplanations 간의 차이는 무엇인가?
RQ2exoplanations가 사용자를 오도할 수 있는 이유와 그것이 야기하는 사회적 위험은 무엇인가?
RQ3exoplanations의 해를 완화하면서 비판적 사고 지원을 유지할 수 있는 설계 전략은 무엇인가?

주요 결과

Exoplanations은 모델의 생성 프로세스에 대한 근거 있는 반영이 아니며 예측의 진정한 이유를 잘못 나타낼 수 있다.
Exoplanations은 잘못된 확신, 비판적 사고의 저하, AI 시스템에 대한 신뢰의 침식을 초래할 수 있다.
가드레일, 면책 고지, 공동 감사 도구는 사용자가 exoplanations에 과도하게 의존하지 않고 출력을 평가하도록 돕는다.
Exoplanations는 맥락에 맞춰 제공될 때 사용자 반성을 촉진하고 비판적 사고를 지원하는 데 여전히 유용할 수 있다.
본 논문은 기계적 충실도보다는 의사결정 지원에 초점을 맞춘 설명 가능성의 사회적 구성에 대한 논거를 제시한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.