QUICK REVIEW

[論文レビュー] MEVER: Multi-Modal and Explainable Claim Verification with Graph-based Evidence Retrieval

Delvin Ce Zhang, Suhan Cui|arXiv (Cornell University)|Feb 10, 2026

Multimodal Machine Learning Applications被引用数 0

ひとこと要約

MEVER は、二層のマルチモーダルグラフとトークン/エビデンスレベルの融合を用いて、マルチモーダルな証拠検索、主張検証、説明生成を共同で実行し、新しい AIChartClaim 科学データセットを導入します。

ABSTRACT

Verifying the truthfulness of claims usually requires joint multi-modal reasoning over both textual and visual evidence, such as analyzing both textual caption and chart image for claim verification. In addition, to make the reasoning process transparent, a textual explanation is necessary to justify the verification result. However, most claim verification works mainly focus on the reasoning over textual evidence only or ignore the explainability, resulting in inaccurate and unconvincing verification. To address this problem, we propose a novel model that jointly achieves evidence retrieval, multi-modal claim verification, and explanation generation. For evidence retrieval, we construct a two-layer multi-modal graph for claims and evidence, where we design image-to-text and text-to-image reasoning for multi-modal retrieval. For claim verification, we propose token- and evidence-level fusion to integrate claim and evidence embeddings for multi-modal verification. For explanation generation, we introduce multi-modal Fusion-in-Decoder for explainability. Finally, since almost all the datasets are in general domain, we create a scientific dataset, AIChartClaim, in AI domain to complement claim verification community. Experiments show the strength of our model.

研究の動機と目的

テキストデータ（テキストキャプションとチャート）を横断する主張の堅牢な検証を動機づける。
テキストと画像を横断するエビデンスを取得するための二層のマルチモーダルグラフを開発する。
検証のためにマルチモーダル証拠を結合するためのトークンレベルおよびエビデンスレベルの融合を設計する。
Fusion-in-Decoder モジュールと整合性正則化子を用いて、主張とエビデンスの両方を参照したマルチモーダルな説明生成を可能にする。
マルチモーダルな科学的検証研究を進めるための科学AIドメインのチャート主張データセット（AIChartClaim）を作成する。

提案手法

各エビデンス項目ごとに画像層とテキスト層の二層マルチモーダルグラフを構築し、取得のための画像-to-text および text-to-image 推論を可能にする。
画像/テキストのクロスアテンションを備えたネストされたマルチモーダルエンコーダを用いて、検証のための統一された主張/エビデンス埋め込みを生成する。
主張と取得エビデンス間のトークンレベルのマルチモーダル融合を実装し、トークンレベルで情報を交換する。
エビデンスレベルの階層的融合を適用して統合されたマルチモーダル証拠埋め込みを得、MLP分類器で検証を行う。
主張とエビデンスの両方を参照する説明を生成するマルチモーダル Fusion-in-Decoder を導入し、説明と検証決定を整合させるための整合性正則化子を追加する。

Figure 1: Illustration of multi-modal and explainable claim verification, taken from AIChartClaim dataset.

実験結果

リサーチクエスチョン

RQ1テキストと視覚的な証拠を統合して、共同のマルチモーダル証拡証拠検索を実現するにはどうすればよいか。
RQ2マルチモーダル環境で主張とエビデンスを推論して真偽を検証するにはどうすればよいか。
RQ3マルチモーダル検証結果に忠実な説明をどのように生成するか。
RQ4画像を含めることが科学的チャートベースの主張の取得、検証精度、および説明可能性に与える影響は何か。
RQ5専用のAIドメインデータセット（AIChartClaim）は、マルチモーダル主張検証の評価と開発を改善できるか。

主な発見

MEVER は、AIChartClaim、ChartCheck、Mocheg、MR2 データセットにおける証拠検索でテキストのみおよび他のマルチモーダルベースラインを上回る（MAP および Prec@K の改善が報告されている）。
検証では、画像を用いた MEVER が AIChartClaim（ retrieved: 71.6%、Gold: 71.6%）および ChartCheck（ retrieved: 64.1%、Gold: 64.3%）で複数のベースラインより高い Macro F1 を達成する。
MEVER の説明モジュールは、 retrieved エビデンスを使用した場合、AIChartClaim および ChartCheck で ROUGE-L、METEOR、BLEU-2 のスコアが競争力を持つ（例：AIChartClaim の ROUGE-L 34.5、METEOR 27.8、BLEU-2 21.3）。
w/o-images アブレーションは、画像を組み込んだ場合に性能が向上することを示し、取得と検証における視覚的証拠の価値を確認する。
導入された AIChartClaim データセットは、説明付きの主張と 1,200 件の主張および 300 枚のチャートを提供し、AI ドメインのマルチモーダル科学チャート検証の評価を可能にする。

Figure 2: Model architecture. (a-b) Cross-modal graph reasoning. (c) A nested architecture with multi-modal graph reasoning. (d) Multi-modal token-level fusion. (e) Multi-modal explanation generation with Fusion-in-Decoder.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。