QUICK REVIEW

[論文レビュー] Re-Search for The Truth: Multi-round Retrieval-augmented Large Language Models are Strong Fake News Detectors

Guanghua Li, Wensheng Lu|arXiv (Cornell University)|Mar 14, 2024

Misinformation and Its Impacts被引用数 9

ひとこと要約

STEELは、エンドツーエンドのリトリーバル強化LLMフレームワークであり、複数ラウンドのウェブ証拠検索を用いて主張を検証し説明を提供し、3つの実世界データセットにおける偽情報検出を高精度で達成します。

ABSTRACT

The proliferation of fake news has had far-reaching implications on politics, the economy, and society at large. While Fake news detection methods have been employed to mitigate this issue, they primarily depend on two essential elements: the quality and relevance of the evidence, and the effectiveness of the verdict prediction mechanism. Traditional methods, which often source information from static repositories like Wikipedia, are limited by outdated or incomplete data, particularly for emerging or rare claims. Large Language Models (LLMs), known for their remarkable reasoning and generative capabilities, introduce a new frontier for fake news detection. However, like traditional methods, LLM-based solutions also grapple with the limitations of stale and long-tail knowledge. Additionally, retrieval-enhanced LLMs frequently struggle with issues such as low-quality evidence retrieval and context length constraints. To address these challenges, we introduce a novel, retrieval-augmented LLMs framework--the first of its kind to automatically and strategically extract key evidence from web sources for claim verification. Employing a multi-round retrieval strategy, our framework ensures the acquisition of sufficient, relevant evidence, thereby enhancing performance. Comprehensive experiments across three real-world datasets validate the framework's superiority over existing methods. Importantly, our model not only delivers accurate verdicts but also offers human-readable explanations to improve result interpretability.

研究の動機と目的

偽ニュース検出における静的知識源と単発検索の限界に対処する。
主張を検証するためにインターネットから証拠を収集する自動化フレームワークを開発する。
結果の透明性を高めるために、説明付きの解釈可能な判定を提供する。
重いモデル訓練を必要としない、即利用可能なオープンソース実装を可能にする。

提案手法

ウェブベースの証拠検索を意味的フィルタリングと文書/テキストブロック検索と統合する。
収集した証拠についてLLMを用いて推論し、信頼度スコア付きでtrue/false/NEIを出力する。
証拠が不十分な場合に更新クエリを生成するマルチラウンドの再探索機構を実装する。
後続の判断を導き、冗長性を減らすために既存証拠のエビデンスプールを組み込む。
LLMの信頼度スコアのキャリブレーションに過信補正を適用する。
実世界データセットを用いて評価し、幅広いベースラインと比較する。

実験結果

リサーチクエスチョン

RQ1マルチラウンドのインターネットベースの検索は、単発手法より偽ニュース検出を改善できるのか？
RQ2再探索機構は証拠の質と検証精度にどのような影響を与えるか？
RQ3取得深さ(k)と証拠の長さ(l)がデータセット全体での性能に与える影響はどの程度か？
RQ4STEELは、最先端のエビデンスベースおよびLLMベースのベースラインと比較して、精度と解釈性の面でどうか？

主な発見

方法	LIAR F1-Ma	LIAR F1-Mi	LIAR F1-T	LIAR P-T	LIAR R-T	LIAR F1-F	LIAR P-F	LIAR R-F
DeClarE	0.573	0.571	0.531	0.550	0.546	0.619	0.587	0.597
HAN	0.588	0.591	0.563	0.545	0.532	0.606	0.618	0.611
EHIAN	0.591	0.593	0.559	0.543	0.548	0.630	0.603	0.617
MAC	0.603	0.601	0.562	0.558	0.567	0.625	0.623	0.621
GET	0.614	0.610	0.572	0.567	0.579	0.641	0.654	0.632
MUSER	0.645	0.642	0.647	0.640	0.654	0.643	0.650	0.636
ReRead	0.611	0.615	0.587	0.581	0.596	0.633	0.628	0.626
GPT-3.5-turbo	0.563	0.541	0.559	0.572	0.567	0.555	0.564	0.560
Vicuna-7B	0.528	0.535	0.521	0.543	0.552	0.519	0.538	0.526
WEBGLM-2B	0.601	0.597	0.558	0.563	0.571	0.622	0.604	0.618
ProgramFC	0.631	0.613	0.637	0.607	0.639	0.625	0.611	0.628
STEEL	0.714*	0.689*	0.685*	0.680*	0.691*	0.743*	0.725*	0.752*

STEELは3つの実世界データセットで最先端ベースラインを上回り、F1-MaとF1-Miで顕著な改善を示す（macroおよびmicro F1で5ポイント超）。
STEELはLIAR、CHEF、PolitiFact全体で高い偽ニュース検出性能を達成し、いくつかの指標で統計的に有意な改善を示す（*によって示される）。
再探索機構は、直接検索、キーワード検索、パラフレーズ戦略よりも証拠の質を高める。
最適な検索設定が見つかった：3つのURLと全長の証拠（l=all）で性能が最大化する。
アブレーション実験は、検索や再探索を削除すると性能が低下することを示し、両モジュールの本質的役割を確認する。
説明可能性の研究は、整合性のある人間が読める説明と判定への証拠帰属を実証する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。