[论文解读] The Troubling Emergence of Hallucination in Large Language Models -- An Extensive Definition, Quantification, and Prescriptive Remediations
本论文提出对 LLM 幻觉的细粒度分类法,介绍一个公开数据集(HILT),定义幻觉易受性指数(HVI)以对模型进行排序,并提出两种缓解策略。
The recent advancements in Large Language Models (LLMs) have garnered widespread acclaim for their remarkable emerging capabilities. However, the issue of hallucination has parallelly emerged as a by-product, posing significant concerns. While some recent endeavors have been made to identify and mitigate different types of hallucination, there has been a limited emphasis on the nuanced categorization of hallucination and associated mitigation methods. To address this gap, we offer a fine-grained discourse on profiling hallucination based on its degree, orientation, and category, along with offering strategies for alleviation. As such, we define two overarching orientations of hallucination: (i) factual mirage (FM) and (ii) silver lining (SL). To provide a more comprehensive understanding, both orientations are further sub-categorized into intrinsic and extrinsic, with three degrees of severity - (i) mild, (ii) moderate, and (iii) alarming. We also meticulously categorize hallucination into six types: (i) acronym ambiguity, (ii) numeric nuisance, (iii) generated golem, (iv) virtual voice, (v) geographic erratum, and (vi) time wrap. Furthermore, we curate HallucInation eLiciTation (HILT), a publicly available dataset comprising of 75,000 samples generated using 15 contemporary LLMs along with human annotations for the aforementioned categories. Finally, to establish a method for quantifying and to offer a comparative spectrum that allows us to evaluate and rank LLMs based on their vulnerability to producing hallucinations, we propose Hallucination Vulnerability Index (HVI). We firmly believe that HVI holds significant value as a tool for the wider NLP community, with the potential to serve as a rubric in AI-related policy-making. In conclusion, we propose two solution strategies for mitigating hallucinations.
研究动机与目标
- 提供对 LLM 幻觉的细粒度分类,按取向、类别和程度。
- 创建一个公开可用的数据集(HILT),覆盖 15 个 LLM 和 75,000 个样本的人类注释。
- 引入幻觉易受性指数(HVI),以对 LLM 的幻觉易感性进行排名。
- 提出两种缓解策略(自动化和人机协同)并评估其潜在影响。
- 讨论对政策和未来在幻觉感知的 NLP 的研究的影响。
提出的方法
- 将幻觉定义为两个取向(Factual Mirage 和 Silver Lining),含内在/外在子类别和三个程度(mild, moderate, alarming)。
- 将幻觉分为六类(缩写歧义、数值烦扰、生成的傀儡、虚拟声音、地理错序、时间包裹)并给出例子。
- 通过 15 个 LLM 生成 75,000 个样本(每模型 5,000)使用 NYTimes 推文和 Politifact 提示构建 HILT,并通过 MACE 进行方向和类别的人类注释。
- 定义并计算幻觉易受性指数(HVI)以对 LLM 进行排名,包括阻尼因子和归一化到 0-100 的尺度。
- 呈现两种缓解策略:a)高熵单词探测与替换( ENTROPY BB,black-box)和 b)通过文本蕴涵进行句子级事实性检查( FACTUALITY GB,gray-box)。
- 讨论使用外部资源(Google Search API)和蕴涵模型(RoBERTa Large)进行事实性检查和人机协同审查。

实验结果
研究问题
- RQ1LLM 输出中有哪些独特的取向和类别的幻觉?
- RQ2如何量化并比较在多样化模型集合中对幻觉的易感性?
- RQ3何种数据集与注释方案能实现跨模型对幻觉类型的鲁棒分析?
- RQ4哪些缓解策略能减少幻觉,黑箱与灰箱方法各有多大效果?
- RQ5HVI 如何为基础模型的政策与风险评估提供信息?
主要发现
- HILT 由 15 个 LLM 的 75,000 条片段组成,每模型 2,500 FM 和 2,500 SL,总计在类别中的 129K 条注释句子。
- HVI 提供 0-100 的尺度来比较幻觉易感性,显示 GPT-3(90),StableLM(82),GPT-2(70),Vicuna(62),MPT(59),LLaMA(57),GPT-3.5(53),及其他分数较低。
- 大型模型在没有 RLHF 的情况下在各取向上显示出更高的幻觉倾向,而在某些情况下,受 RLHF 影响的模型往往有较低的易受性。
- 提出两种缓解基线:ENTROPY BB(基于单词熵的黑箱替换)和 FACTUALITY GB(带外部搜索与蕴涵的灰箱事实性检查)。
- 跨条目事实性检查基于蕴涵的检查在灰箱方法中标记约 26% 的句子可能需要改写。
- HVI 允许跟踪特定类别(如 Time Wrap、Geographic Erratum、Virtual Voice)随模型规模和 RLHF 使用的演变。

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。