QUICK REVIEW

[论文解读] The Value of Variance: Mitigating Debate Collapse in Multi-Agent Systems via Uncertainty-Driven Policy Optimization

Luoxi Tang, Meng Yu|arXiv (Cornell University)|Feb 6, 2026

Multi-Agent Systems and Negotiation被引用 0

一句话总结

引入层次化不确定性指标（内部、不仅、系统级）以诊断MAD辩论崩溃，并提出基于不确定性的策略优化（UDPO）来缓解它，特别是在攻击下提高准确性和鲁棒性。

ABSTRACT

Multi-agent debate (MAD) systems improve LLM reasoning through iterative deliberation, but remain vulnerable to debate collapse, a failure type where final agent decisions are compromised on erroneous reasoning. Existing methods lack principled mechanisms to detect or prevent such failures. To address this gap, we first propose a hierarchical metric that quantifies behavioral uncertainty at three levels: intra-agent (individual reasoning uncertainty), inter-agent (interactive uncertainty), and system-level (output uncertainty). Empirical analysis across several benchmarks reveals that our proposed uncertainty quantification reliably indicates system failures, which demonstrates the validity of using them as diagnostic metrics to indicate the system failure. Subsequently, we propose a mitigation strategy by formulating an uncertainty-driven policy optimization to penalize self-contradiction, peer conflict, and low-confidence outputs in a dynamic debating environment. Experiments demonstrate that our proposed uncertainty-driven mitigation reliably calibrates the multi-agent system by consistently improving decision accuracy while reducing system disagreement.

研究动机与目标

Motivate the need to diagnose and prevent debate collapse in multi-agent debate (MAD) systems.
Develop a three-level uncertainty quantification framework to detect unstable debate dynamics.
Propose UDPO to penalize self-contradiction, peer conflict, and low-confidence outputs during MAD.
Demonstrate improved accuracy and robustness of MAD under natural and attacked conditions.
Provide an asymmetric optimization approach that tailors penalties to individual agents based on uncertainty.

提出的方法

Define intra-agent flip rate and belief revision as a measure of self-consistency.
Define inter-agent disagreement via pairwise agent conflicts during each debate round.
Define system-level uncertainty using entropy, final disagreement, and leave-one-out instability.
Aggregate these into three uncertainty metrics U_intra, U_inter, U_sys and show their correlation with correctness.
Formulate Uncertainty-Driven Policy Optimization (UDPO) with an uncertainty-based reward: r_intra, r_inter, r_sys, plus a task reward; implement an asymmetric objective with agent-specific coefficients.
Use a clipped relative-update objective with an anchoring term to stabilize learning and prevent large policy shifts.
Introduce agent-tailored hyperparameters determined from warm-up uncertainty profiles to allocate training focus where needed

实验结果

研究问题

RQ1Can hierarchical uncertainty indicators reliably diagnose debate collapse in MAD systems?
RQ2Do intra-agent, inter-agent, and system-level uncertainties correlate with incorrect/degraded MAD outcomes?
RQ3Does uncertainty-driven policy optimization improve robustness and accuracy of MAD, including under adversarial attacks?
RQ4How does UDPO compare to standard MAD, MAPPO, and RMAAC in terms of accuracy and uncertainty reduction?
RQ5Where and when does UDPO provide the largest gains across question difficulty?

主要发现

不确定性度量能可靠地区分失败与成功的MAD推理，失败在各层面显示显著更高的不确定性。
三种不确定性指标与准确率呈负相关，意味着更高的不确定性预测更低的性能。
UDPO在标准MAD及基线方法上产生显著的准确性提升（例如在N=5的GSM8K中相比标准MAD提升多达25个百分点）。
UDPO显著降低系统级不确定性（例如相对于标准MAD在GSM8K上约80% 的降低）。
非对称、基于不确定性的优化提高对攻击的鲁棒性，在被攻击代理数量增加时保持更高的准确性。
消融显示每个损失分量针对不同的失效模式；移除任一分量都会降低准确性并增加不确定性；系统级损失对准确性的影响最大。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。