Skip to main content
QUICK REVIEW

[论文解读] Zero-Resource Hallucination Prevention for Large Language Models

Junyu Luo, Cao Xiao|arXiv (Cornell University)|Sep 6, 2023
Topic Modeling被引用 10
一句话总结

论文提出 Self-Familiarity,一种零资源、预检测方法,通过对输入指令中的概念提取、概念猜测和聚合来评估模型对概念的熟悉度,从而防止幻觉。它在四个大型语言模型上优于基线,并提供可解释的、主动防错能力,无需外部知识。

ABSTRACT

The prevalent use of large language models (LLMs) in various domains has drawn attention to the issue of "hallucination," which refers to instances where LLMs generate factually inaccurate or ungrounded information. Existing techniques for hallucination detection in language assistants rely on intricate fuzzy, specific free-language-based chain of thought (CoT) techniques or parameter-based methods that suffer from interpretability issues. Additionally, the methods that identify hallucinations post-generation could not prevent their occurrence and suffer from inconsistent performance due to the influence of the instruction format and model style. In this paper, we introduce a novel pre-detection self-evaluation technique, referred to as SELF-FAMILIARITY, which focuses on evaluating the model's familiarity with the concepts present in the input instruction and withholding the generation of response in case of unfamiliar concepts. This approach emulates the human ability to refrain from responding to unfamiliar topics, thus reducing hallucinations. We validate SELF-FAMILIARITY across four different large language models, demonstrating consistently superior performance compared to existing techniques. Our findings propose a significant shift towards preemptive strategies for hallucination mitigation in LLM assistants, promising improvements in reliability, applicability, and interpretability.

研究动机与目标

  • 在开放式LLM应用中,在不依赖外部知识或事后检测的情况下,推动健壮、主动的幻觉抑制。
  • 引入一个零资源预检测框架(Self-Familiarity),对于不熟悉的概念将不输出回答。
  • 开发一个三阶段流程(概念提取、概念猜测、聚合)来评估指令熟悉度。
  • 创建并使用 Concept-7,这是一个跨多个领域的幻觉指令分类数据集,用来验证该方法。

提出的方法

  • 使用 Named Entity Recognition (NER) 从指令中提取概念。
  • 将相邻的概念分组以形成扩展概念,并过滤掉常见术语以降低噪声。
  • 对于每个概念,使用标准提示生成解释并对概念术语进行屏蔽。
  • 使用受限束搜索从被屏蔽的解释中推断原始概念,以获取每个概念的熟悉度分数。
  • 将概念层面的分数聚合成指令层面的熟悉度分数,采用基于频率的加权和几何衰减方案。
Figure 1: A hallucination example. Red color indicates the incorrect information.
Figure 1: A hallucination example. Red color indicates the incorrect information.

实验结果

研究问题

  • RQ1在开放式LLM中,零资源、基于概念级评估的预检测方法是否能降低幻觉风险?
  • RQ2如何将概念提取、基于解释的猜测和鲁棒聚合结合起来,产生可靠的指令级熟悉度信号?
  • RQ3Self-Familiarity 是否在不同模型架构和指令风格下具有泛化性,而不依赖外部知识?

主要发现

  • Self-Familiarity 在 Concept-7 数据集上对四种大型语言模型的基线方法持续保持领先。
  • 该方法与金标准解释具有高度皮尔逊相关性,表明与人类对概念熟悉度的理解一致。
  • 消融结果显示移除分组、筛选或排序会降低性能,验证了每个组件的贡献。
  • 人工注释评估证实基于 GPT-4 的结果,确认了该方法的鲁棒性和可解释性。
Figure 2: Example procedure of the Self-Familiarity .
Figure 2: Example procedure of the Self-Familiarity .

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。