QUICK REVIEW

[論文レビュー] Learning to Generate and Extract: A Multi-Agent Collaboration Framework For Zero-shot Document-level Event Arguments Extraction

Guangjun Zhang, Hu Zhang|arXiv (Cornell University)|Mar 3, 2026

Topic Modeling被引用数 0

ひとこと要約

要約: 本論文は、二つのエージェント（生成と評価）強化学習フレームワークを提案し、ゼロショットの文書レベルイベント引数抽出（DEAE）を合成・抽出することで性能とデータ品質を向上させる。

ABSTRACT

Document-level event argument extraction (DEAE) is essential for knowledge acquisition, aiming to extract participants of events from documents . In the zero-shot setting, existing methods employ LLMs to generate synthetic data to address the challenge posed by the scarcity of annotated data. However, relying solely on Event-type-only prompts makes it difficult for the generated content to accurately capture the contextual and structural relationships of unseen events. Moreover, ensuring the reliability and usability of synthetic data remains a significant challenge due to the absence of quality evaluation mechanisms. To this end, we introduce a multi-agent collaboration framework for zero-shot document-level event argument extraction (ZS-DEAE), which simulates the human collaborative cognitive process of "Propose-Evaluate-Revise." Specifically, the framework comprises a generation agent and an evaluation agent. The generation agent synthesizes data for unseen events by leveraging knowledge from seen events, while the evaluation agent extracts arguments from the synthetic data and assesses their semantic consistency with the context. The evaluation results are subsequently converted into reward signals, with event structure constraints incorporated into the reward design to enable iterative optimization of both agents via reinforcement learning.In three zero-shot scenarios constructed from the RAMS and WikiEvents datasets, our method achieves improvements both in data generation quality and argument extraction performance, while the generated data also effectively enhances the zero-shot performance of other DEAE models.

研究の動機と目的

ゼロショットDEAEにおけるデータ不足を合成データ生成で解消する。
二つのLLMベースエージェントで人間のPropose–Evaluate–Reviseワークフローを模倣する。
イベント構造の制約を取り入れて、一貫性があり完全なイベント表現を維持する。
RAMSとWikiEventsのゼロショット設定での改善を示し、他モデルに対する合成データの利得を示す。

提案手法

未知イベントの文脈、トリガー、役割-引数ペアを生成するGeneration Agentを定義する。
生成された文脈から引数テンプレートを埋め、データ品質信号としてログ尤度を出すEvaluation Agent（Bart-Gen）を定義する。
空の引数（None）を抑制するため、構造的完全性ペナルティを加えた正規化されたログ尤度スコアを用いる。
合成データを訓練データ統計（tauとepsilon）に合わせるイベント構造制約を組み込む。
最終品質スコアalphaに基づいて、方策勾配更新を用いた強化学習を適用し、両エージェントを共同最適化する。
合成データの品質とDEAE性能を向上させるためにPropose–Evaluate–Reviseを反復的に実施する。

実験結果

リサーチクエスチョン

RQ1二つのエージェント（生成+評価）の協働はゼロショットの文書レベルイベント引数抽出を改善できるか。
RQ2生成データに対する構造的制約を取り入れることは不完全なイベントへの偏りを緩和するか。
RQ3評価信号からの強化的フィードバックは生成データの品質と下流のDEAE精度を有意に改善できるか。
RQ4このフレームワークで生成された合成データは他のゼロショットDEAEモデルへ転移的利益をもたらすか。
RQ5RAMS2RAMS、RAMS2Wiki、Wiki2Wikiのゼロショット設定でフレームワークはどう機能するか。

主な発見

RAMS2RAMS (Seen)	RAMS2RAMS (Unseen)	RAMS2RAMS (Overall)	RAMS2Wiki (Seen)	RAMS2Wiki (Unseen)	RAMS2Wiki (Overall)	Wiki2Wiki (Seen)	Wiki2Wiki (Unseen)	Wiki2Wiki (Overall)
PAIE	32.52	28.87	30.80	19.57	31.72	20.15	23.58	23.57	24.42
TabEAE	37.16	35.26	36.22	16.94	35.05	26.74	37.19	28.84	30.97
DEEIA	36.57	39.49	37.95	1.50	7.17	5.12	34.11	19.48	22.51
HMPEAE	35.18	37.74	36.44	16.89	32.74	25.61	38.43	27.48	30.20
TSAR	38.10	21.56	30.90	15.77	13.37	11.71	14.40	13.86	13.95
SCPRG	38.93	26.97	33.58	10.80	10.00	9.40	45.80	11.89	21.90
Bart-Gen	39.89	37.09	38.53	24.66	33.45	28.52	48.11	32.68	40.82
Ours (LLaMA)	46.46	45.06	45.77	30.81	34.43	32.38	47.83	46.19	46.96
Ours (Qwen)	44.06	45.11	44.59	31.74	30.47	31.18	47.39	47.82	47.62

三つのゼロショット設定（RAMS2RAMS、RAMS2Wiki、Wiki2Wiki）において、全体F1でベースラインDEAEモデルを上回る。
私たちのモデル（LLaMA）は Wiki2Wiki で 46.96 の全体F1、RAMS2Wiki で 46.38、RAMS2RAMS で 45.77 を報告結果として達成。
提案されたRLベースの最適化と構造的制約は改善に寄与し、どちらかの要素を除くと性能が低下する。
フレームワークによって生成された合成データは、補強として他のモデル（例：TabEAE、Bart-Gen）のゼロショット性能を有意に向上させる。
評価エージェントのログ尤度はデータ品質と相関し、高品質と低品質の合成サンプルを識別できる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。