QUICK REVIEW

[论文解读] Re-Search for The Truth: Multi-round Retrieval-augmented Large Language Models are Strong Fake News Detectors

Guanghua Li, Wensheng Lu|arXiv (Cornell University)|Mar 14, 2024

Misinformation and Its Impacts被引用 9

一句话总结

STEEL 是一个端到端的检索增强型大语言模型框架，使用多轮网络证据检索来验证主张并提供解释，在三个真实世界数据集上实现了强健的假新闻检测。

ABSTRACT

The proliferation of fake news has had far-reaching implications on politics, the economy, and society at large. While Fake news detection methods have been employed to mitigate this issue, they primarily depend on two essential elements: the quality and relevance of the evidence, and the effectiveness of the verdict prediction mechanism. Traditional methods, which often source information from static repositories like Wikipedia, are limited by outdated or incomplete data, particularly for emerging or rare claims. Large Language Models (LLMs), known for their remarkable reasoning and generative capabilities, introduce a new frontier for fake news detection. However, like traditional methods, LLM-based solutions also grapple with the limitations of stale and long-tail knowledge. Additionally, retrieval-enhanced LLMs frequently struggle with issues such as low-quality evidence retrieval and context length constraints. To address these challenges, we introduce a novel, retrieval-augmented LLMs framework--the first of its kind to automatically and strategically extract key evidence from web sources for claim verification. Employing a multi-round retrieval strategy, our framework ensures the acquisition of sufficient, relevant evidence, thereby enhancing performance. Comprehensive experiments across three real-world datasets validate the framework's superiority over existing methods. Importantly, our model not only delivers accurate verdicts but also offers human-readable explanations to improve result interpretability.

研究动机与目标

解决假新闻检测中静态知识源和单次检索的局限性。
开发一个自动化框架，从互联网上收集证据以验证主张。
提供可解释的判定及其解释，以提高结果透明度。
实现开箱即用的开源实现，且无需大量模型训练。

提出的方法

将基于网络的证据检索与语义过滤以及文档/文本块检索相结合。
使用大语言模型对收集的证据进行推理，并输出 true/false/NEI 及置信分数。
实现多轮再检索机制，在证据不足时生成更新的查询。
纳入证据池（已确证证据）以指导后续判断并减少冗余。
对LLM置信度分数的校准进行过度自信调整。
使用真实世界数据集进行评估，并与广泛的基线进行比较。

实验结果

研究问题

RQ1多轮、基于互联网的检索能否在假新闻检测中超过单次检索方法？
RQ2再检索机制如何影响证据质量与验证准确性？
RQ3检索深度(k) 与证据长度(l) 对不同数据集的性能影响如何？
RQ4STEEL 在准确性和可解释性方面与最先进的基于证据的和基于LLM 的基线相比如何？

主要发现

STEEL 在三个真实世界数据集上优于最先进基线，在 F1-Ma 和 F1-Mi 上有显著提升（宏观和微观 F1 都超过 5 个百分点）。
STEEL 在 LIAR、CHEF 和 PolitiFact 上实现强劲的假新闻检测，在若干指标上具有统计显著的改进（以*标注）。
再检索机制产生的证据质量优于直接搜索、关键词搜索或改述策略。
发现的最优检索设置：三条URL和全长证据（l=all）可最大化性能。
消融实验表明，移除检索或再检索会降低性能，证实这两个模块的关键作用。
可解释性研究显示结论具有连贯、易于人类理解的解释与证据归因。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。