QUICK REVIEW

[论文解读] Large Language Models Cannot Explain Themselves

Advait Sarkar|arXiv (Cornell University)|May 7, 2024

Topic Modeling被引用 6

一句话总结

该论文认为语言模型不能提供对其输出的机理性解释，并引入术语“exoplanations”以将其与真正的解释区分开来，提出设计护栏和共同审评策略以促进批判性思维。

ABSTRACT

Large language models can be prompted to produce text. They can also be prompted to produce "explanations" of their output. But these are not really explanations, because they do not accurately reflect the mechanical process underlying the prediction. The illusion that they reflect the reasoning process can result in significant harms. These "explanations" can be valuable, but for promoting critical thinking rather than for understanding the model. I propose a recontextualisation of these "explanations", using the term "exoplanations" to draw attention to their exogenous nature. I discuss some implications for design and technology, such as the inclusion of appropriate guardrails and responses when models are prompted to generate explanations.

研究动机与目标

区分语言模型输出的机理性解释与 exoplanations 的必要性与动机。
突出 exoplanations 带来的社会伤害以及 AI 可解释性中的再情境化需要。
提出设计层面的含义，包括护栏和共同审评工具，以提升决策支持与批判性思维。

提出的方法

将机理性解释与 exoplanations 区分并解释为什么 E-type 输出不能反映底层机制。
论证 exoplanations 由与 O 相同的预测过程生成，且缺乏对模型内部的 grounding。
讨论 exoplanations 的社会与安全危害以及可能导致的错误决策。
提出实用的设计干预措施，如免责声明、护栏和共同审评方法，以降低风险。

实验结果

研究问题

RQ1在语言模型的背景下，机理性解释与 exoplanations 之间有什么区别？
RQ2为什么 exoplanations 会误导用户，它们带来哪些社会风险？
RQ3在保持有用的批判性思维支持的同时，哪些设计策略可以缓解 exoplanations 的危害？

主要发现

Exoplanations 不是对模型产生过程的有据可依的反映，可能歪曲对预测真正原因的理解。
Exoplanations 可能导致错误的自信、批判性思维下降，以及对 AI 系统信任的侵蚀。
护栏、免责声明和共同审评工具可以帮助用户在不对 exoplanations 过度依赖的情况下评估输出。
在适当的情境化使用中，Exoplanations 仍可能对促使用户反思和支持批判性思维有用。
本文主张以社会建构的可解释性为核心，聚焦于决策支持而非机理性保真。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。