[论文解读] Learning to Generate and Extract: A Multi-Agent Collaboration Framework For Zero-shot Document-level Event Arguments Extraction
该论文提出一个两智能体(生成与评估)的强化学习框架,用于合成与提取零样本的文档级事件参数,提升DEAE的性能与数据质量。
Document-level event argument extraction (DEAE) is essential for knowledge acquisition, aiming to extract participants of events from documents . In the zero-shot setting, existing methods employ LLMs to generate synthetic data to address the challenge posed by the scarcity of annotated data. However, relying solely on Event-type-only prompts makes it difficult for the generated content to accurately capture the contextual and structural relationships of unseen events. Moreover, ensuring the reliability and usability of synthetic data remains a significant challenge due to the absence of quality evaluation mechanisms. To this end, we introduce a multi-agent collaboration framework for zero-shot document-level event argument extraction (ZS-DEAE), which simulates the human collaborative cognitive process of "Propose-Evaluate-Revise." Specifically, the framework comprises a generation agent and an evaluation agent. The generation agent synthesizes data for unseen events by leveraging knowledge from seen events, while the evaluation agent extracts arguments from the synthetic data and assesses their semantic consistency with the context. The evaluation results are subsequently converted into reward signals, with event structure constraints incorporated into the reward design to enable iterative optimization of both agents via reinforcement learning.In three zero-shot scenarios constructed from the RAMS and WikiEvents datasets, our method achieves improvements both in data generation quality and argument extraction performance, while the generated data also effectively enhances the zero-shot performance of other DEAE models.
研究动机与目标
- 通过利用合成数据生成来解决零样本DEAE中的数据稀缺问题。
- 用基于两LLM的智能体模拟人类提出—评估—修订的工作流。
- 引入事件结构约束,保持事件表示的连贯性与完整性。
- 在RAMS和WikiEvents的零样本设置中展示改进,并展示合成数据对其他模型的益处。
提出的方法
- 定义生成智能体,用于为未见事件创建上下文、触发词和角色-论元对。
- 定义评估智能体(Bart-Gen),从生成的上下文中填充论元模板,并输出对数似然作为数据质量信号。
- 采用带有额外结构完整性惩罚的归一化对数似然分数,以抑制空的论元(None)。
- 引入事件结构约束,使合成数据与训练数据统计(tau 与 epsilon)保持对齐。
- 应用基于策略梯度的强化学习,对两智能体进行联合优化,使最终质量分数 alpha 最大化。
- 通过迭代的提出—评估—修订,提升合成数据质量和DEAE性能。
实验结果
研究问题
- RQ1两智能体(生成+评估)协作是否能改善零样本文档级事件参数提取?
- RQ2在生成数据中引入结构约束是否能缓解对不完整事件的偏倚?
- RQ3来自评估信号的强化反馈是否能显著提升合成数据质量与下游DEAE准确性?
- RQ4框架生成的合成数据是否对其他零样本DEAE模型具有迁移效益?
- RQ5框架在RAMS2RAMS、RAMS2Wiki与Wiki2Wiki零样本设置下的表现如何?
主要发现
| RAMS2RAMS (Seen) | RAMS2RAMS (Unseen) | RAMS2RAMS (Overall) | RAMS2Wiki (Seen) | RAMS2Wiki (Unseen) | RAMS2Wiki (Overall) | Wiki2Wiki (Seen) | Wiki2Wiki (Unseen) | Wiki2Wiki (Overall) | |
|---|---|---|---|---|---|---|---|---|---|
| PAIE | 32.52 | 28.87 | 30.80 | 19.57 | 31.72 | 20.15 | 23.58 | 23.57 | 24.42 |
| TabEAE | 37.16 | 35.26 | 36.22 | 16.94 | 35.05 | 26.74 | 37.19 | 28.84 | 30.97 |
| DEEIA | 36.57 | 39.49 | 37.95 | 1.50 | 7.17 | 5.12 | 34.11 | 19.48 | 22.51 |
| HMPEAE | 35.18 | 37.74 | 36.44 | 16.89 | 32.74 | 25.61 | 38.43 | 27.48 | 30.20 |
| TSAR | 38.10 | 21.56 | 30.90 | 15.77 | 13.37 | 11.71 | 14.40 | 13.86 | 13.95 |
| SCPRG | 38.93 | 26.97 | 33.58 | 10.80 | 10.00 | 9.40 | 45.80 | 11.89 | 21.90 |
| Bart-Gen | 39.89 | 37.09 | 38.53 | 24.66 | 33.45 | 28.52 | 48.11 | 32.68 | 40.82 |
| Ours (LLaMA) | 46.46 | 45.06 | 45.77 | 30.81 | 34.43 | 32.38 | 47.83 | 46.19 | 46.96 |
| Ours (Qwen) | 44.06 | 45.11 | 44.59 | 31.74 | 30.47 | 31.18 | 47.39 | 47.82 | 47.62 |
- 在三个零样本设置(RAMS2RAMS、RAMS2Wiki、Wiki2Wiki)上总体F1分数优于基线DEAE模型。
- 在报道结果中,我们的方法(LLaMA)在Wiki2Wiki上获得46.96的总体F1,在RAMS2Wiki上为46.38,在RAMS2RAMS上为45.77。
- 提出的基于RL的优化与结构约束共同推动了性能提升;移除任一组件都会降低性能。
- 框架生成的合成数据在扩充时对其他模型(如TabEAE、Bart-Gen)显著提升零样本性能。
- 评估智能体的对数似然与数据质量相关,能够区分高质量与低质量的合成样本。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。