QUICK REVIEW

[论文解读] Rethinking Interpretability in the Era of Large Language Models

Chandan Singh, Jeevana Priya Inala|arXiv (Cornell University)|Jan 30, 2024

Natural Language Processing Techniques被引用 41

一句话总结

本文主张大语言模型（LLMs）能够通过提供自然语言解释和交互分析来重新定义可解释性，同时概述基于LLM的解释和数据集解释的机会、挑战与研究优先级。

ABSTRACT

Interpretable machine learning has exploded as an area of interest over the last decade, sparked by the rise of increasingly large datasets and deep neural networks. Simultaneously, large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks, offering a chance to rethink opportunities in interpretable machine learning. Notably, the capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human. However, these new capabilities raise new challenges, such as hallucinated explanations and immense computational costs. In this position paper, we start by reviewing existing methods to evaluate the emerging field of LLM interpretation (both interpreting LLMs and using LLMs for explanation). We contend that, despite their limitations, LLMs hold the opportunity to redefine interpretability with a more ambitious scope across many applications, including in auditing LLMs themselves. We highlight two emerging research priorities for LLM interpretation: using LLMs to directly analyze new datasets and to generate interactive explanations.

研究动机与目标

在大型语言模型（LLMs）及其解释能力的语境下重新考虑可解释性。
评估LLMs如何解释模型行为和数据，超越传统的事后方法。
识别交互式自然语言解释和数据为基础的推理的机会。
突出诸如幻觉、计算成本和对LLMs的访问限制等挑战。
倡导两个新兴的优先事项：使用LLMs分析新数据集和生成交互式解释。

提出的方法

调查并对现有的LLM解释方法进行分类（局部解释与全局/机制性解释）。
评估解释LLM输出的方法，包括事后NL解释、逐步推理提示，以及检索增强生成（RAG）。
讨论机制性和数据集解释技术，包括探针、神经元/回路分析，以及分析训练数据影响。
提出对解释的评估考虑，在人体研究、自动化指标与偏见考量之间取得平衡。
概述两方面的实际重点：对LLMs进行审计，以及使用LLMs来解释数据集。

实验结果

研究问题

RQ1如何使用LLMs有效且可靠地解释模型预测和数据模式？
RQ2LLMs在超越传统可解释性方法的前提下，在哪些方面为交互性和数据集为基础的解释提供机会？
RQ3要实现鲁棒的基于LLMs的解释，需要解决哪些挑战（如幻觉、成本、可访问性）？
RQ4在真实世界情境中，对LLM解释的有效评估策略是什么？
RQ5利用LLMs来解释数据集和模型行为的优先研究方向是什么？

主要发现

LLMs能够提供自然语言、交互式解释，覆盖复杂模式和数据关系。
局部解释可以利用标记分配、注意力分析和事后NL解释，以及如 chain-of-thought prompting 等技术以提高保真度。
全局/机制性解释可以探查表示、分析注意力头，并研究训练数据的影响，尽管向大模型扩展具有挑战性。
使用LLMs进行数据集解释有助于分析表格数据和文本数据，包括 GAMs、分类器预测，以及基于提示的链以理解数据模式。
对解释的评估应考虑现实世界的结果，并与人类表现互补，避免仅依赖用户判断或自报有用性。
未来的优先事项包括提高解释可靠性、推进交互式解释，以及利用LLMs从数据集中进行知识发现。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。