Skip to main content
QUICK REVIEW

[论文解读] ProMAS: Proactive Error Forecasting for Multi-Agent Systems Using Markov Transition Dynamics

Xinkui Zhao, Sai Liu|arXiv (Cornell University)|Mar 12, 2026
Software System Performance and Reliability被引用 0
一句话总结

ProMAS 提出一个前瞻性框架,通过对语义转换进行建模,使用向量马尔科夫空间和跳跃检测机制实现对多智能体推理中的逻辑违背的预测与定位,以实现实时错误干预。

ABSTRACT

The integration of Large Language Models into Multi-Agent Systems (MAS) has enabled the so-lution of complex, long-horizon tasks through collaborative reasoning. However, this collec-tive intelligence is inherently fragile, as a single logical fallacy can rapidly propagate and lead to system-wide failure. Most current research re-lies on post-hoc failure analysis, thereby hinder-ing real-time intervention. To address this, we propose PROMAS, a proactive framework utiliz-ing Markov transitions for predictive error anal-ysis. PROMAS extracts Causal Delta Features to capture semantic displacement, mapping them to a quantized Vector Markov Space to model reasoning as probabilistic transitions. By inte-grating a Proactive Prediction Head with Jump Detection, the method localizes errors via risk acceleration rather than static thresholds. On the Who&When benchmark, PROMAS achieves 22.97% step-level accuracy while processing only 27% of reasoning logs. This performance rivals reactive monitors like MASC while reducing data overhead by 73%. Although this strategy entails an accuracy trade-off compared to post-hoc meth-ods, it significantly improves intervention latency, balancing diagnostic precision with the real-time demands of autonomous reasoning.

研究动机与目标

  • 将后验错误分析转向对 MAS 的实时前瞻性错误定位。
  • 将推理表示为潜在流形上的因果语义位移。
  • 将语义转换量化并用马尔科夫框架建模以评估失效风险。
  • 开发动态跳跃检测机制,以在数据需求较低的情况下触发及时干预。
  • 在显著降低上下文处理量的同时 demonstrating 具竞争力的逐步定位精度。

提出的方法

  • 提取表示连续推理步骤之间语义位移的因果 Delta 特征。
  • 使用冻结的 LLM 主干和学习的注意力池化,将高维对话历史投影到潜在语义流形。
  • 将因果空间量化为 K 个行动原型,建立独立的 N_fail 和 N_succ 转移计数以推导贝叶斯平滑风险 lambda_ij。
  • 引入前瞻性预测头以预测下一个行动簇分布并计算 Top-M 簇的期望风险 R_t。
  • 使用带风险速度 (nabla R_t) 的动态跳跃检测,在风险变动超过阈值时触发定位,实现实时干预。
Figure 1 : Overview of the ProMAS . The architecture is divided into two coupled components: (Left) Proactive Inference: The dialogue history $H_{t-1}$ is encoded into a latent state $s_{t-1}$ to predict the cluster distribution and transition risk $R_{t}$ of the next action. A Jump Detection mechan
Figure 1 : Overview of the ProMAS . The architecture is divided into two coupled components: (Left) Proactive Inference: The dialogue history $H_{t-1}$ is encoded into a latent state $s_{t-1}$ to predict the cluster distribution and transition risk $R_{t}$ of the next action. A Jump Detection mechan

实验结果

研究问题

  • RQ1ProMAS 能否在逐步定位精度上超过现有的被动检测方法?
  • RQ2在前瞻性错误预测中,跳跃检测在抑制误报方面的有效性如何?
  • RQ3用因果 Delta 特征对语义转移建模是否优于对错位点定位的绝对状态表征?
  • RQ4与离线基线相比,前瞻性检测的信息效率(所需上下文)是多少?
  • RQ5在 Who&When 的不同 backbone 和数据划分(Automated vs Handcraft)下,ProMAS 的泛化能力如何?

主要发现

  • ProMAS 在 Who&When 上实现了 22.97% 的逐步定位准确率,同时平均处理约 26.79% 的对话上下文。
  • ProMAS 超越在线基线 MASC 在逐步定位上的表现,并且处理的上下文量显著少于离线基线。
  • ProMAS 在不同 backbone 和数据划分下保持总的逐智能体层面准确率约 40.54%,并且逐步定位性能稳定(约 22%–25%)。
  • 消融实验显示三元组损失和因果 Delta 特征至关重要,去除它们会显著降低性能,而跳跃检测在对比静态阈值时提升了精确度。
  • ProMAS 提供在实时约束下具有竞争力的定位能力,弥合离线诊断与在线干预之间的差距。
  • 与某些事后方法相比,ProMAS 在实现相似定位精度的同时将数据开销下降约 73%。
Figure 2 : Ablation study of ProMAS on the Algorithm-Generated split of Who&When benchmark.
Figure 2 : Ablation study of ProMAS on the Algorithm-Generated split of Who&When benchmark.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。