QUICK REVIEW

[论文解读] Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation

Niels Mündler, Jingxuan He|arXiv (Cornell University)|May 25, 2023

Ferroelectric and Negative Capacitance Devices被引用 46

一句话总结

本文分析在指令微调的语言模型中出现的自相矛盾的幻觉，提出基于提示的流程来触发、检测，并迭代缓解矛盾，在没有外部检索的情况下实现强检测和对矛盾的大幅降低。

ABSTRACT

Large language models (large LMs) are susceptible to producing text that contains hallucinated content. An important instance of this problem is self-contradiction, where the LM generates two contradictory sentences within the same context. In this work, we present a comprehensive investigation into self-contradiction for various instruction-tuned LMs, covering evaluation, detection, and mitigation. Our primary evaluation task is open-domain text generation, but we also demonstrate the applicability of our approach to shorter question answering. Our analysis reveals the prevalence of self-contradictions, e.g., in 17.7% of all sentences produced by ChatGPT. We then propose a novel prompting-based framework designed to effectively detect and mitigate self-contradictions. Our detector achieves high accuracy, e.g., around 80% F1 score when prompting ChatGPT. The mitigation algorithm iteratively refines the generated text to remove contradictory information while preserving text fluency and informativeness. Importantly, our entire framework is applicable to black-box LMs and does not require retrieval of external knowledge. Rather, our method complements retrieval-based methods, as a large portion of self-contradictions (e.g., 35.2% for ChatGPT) cannot be verified using online text. Our approach is practically effective and has been released as a push-button tool to benefit the public at https://chatprotect.ai/.

研究动机与目标

通过强调自我矛盾作为LLM输出中一种关键且可验证的非事实性形式来激发本研究。
定义一个仅依赖提示和内部推理的触发-检测-缓解流程，避免外部知识检索。
在多种现代语言模型（GPT-4、ChatGPT、Llama2-70B-Chat、Vicuna-13B）和任务（开放域生成和问答）上实证评估该框架。
量化检测准确性以及缓解对文本流畅性和信息性的影响，并向从业者发布一个可用的工具。

提出的方法

将自我矛盾定义为在同一情境下、逻辑不一致的两句话。
通过使用生成式语言模型在受限上下文下生成候选句来触发矛盾。
利用提示和类似自然语言推理的设置，使用分析型语言模型来检测矛盾。
通过迭代地让分析型语言模型修订冲突句子，同时保持流畅性和信息性来缓解矛盾。
用面向开放域生成和问答的提示对该流程进行实例化，使之可对抗黑盒LM使用。
提供一个开源工具（chatprotect.ai）和数据集以实现可重复性。

实验结果

研究问题

RQ1在开放域生成中，最先进的指令调优LLM的自我矛盾有多普遍？
RQ2仅使用提示和内部推理（无外部检索）时，检测器能多准确地识别自我矛盾？
RQ3迭代缓解在保持流畅性和信息性的前提下，能在多大程度上降低自我矛盾？
RQ4该框架是否能推广到更短的问答任务和检索增强的设置？
RQ5在不同的生成式大模型（开源与专有）之间，性能有何差异？

主要发现

自我矛盾很普遍，例如在开放域生成中，ChatGPT 的句子中存在自我矛盾的比例为17.7%。
相当一部分自我矛盾（ChatGPT 为35.2%）无法通过在线文本进行验证，限制了基于检索的补救办法。
以ChatGPT作为分析器，检测F1在各gLM上约为80%，缓解能在保持信息性的同时最多移除89.5%的自我矛盾。
缓解在各模型上保持流畅性，困惑度适度上升（表3中为0.44–1.78）。
该方法对专有和开源LM都有效，尽管开源模型在检测/移除性能方面表现出更大变异性。
该框架也适用于检索增强的问答，即使有检索也能检测到显著的自我矛盾。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。