Skip to main content
QUICK REVIEW

[论文解读] INFA-Guard: Mitigating Malicious Propagation via Infection-Aware Safeguarding in LLM-Based Multi-Agent Systems

Yijin Zhou, Xiaoya Lu|arXiv (Cornell University)|Jan 21, 2026
Adversarial Robustness in Machine Learning被引用 0
一句话总结

INFA-Guard 引入感染感知的检测和基于拓扑的修复,以在基于LLM的多智能体系统中单独识别攻击者和感染代理,显著降低攻击传播。它替换攻击者并恢复感染代理以保留拓扑结构。

ABSTRACT

The rapid advancement of Large Language Model (LLM)-based Multi-Agent Systems (MAS) has introduced significant security vulnerabilities, where malicious influence can propagate virally through inter-agent communication. Conventional safeguards often rely on a binary paradigm that strictly distinguishes between benign and attack agents, failing to account for infected agents i.e., benign entities converted by attack agents. In this paper, we propose Infection-Aware Guard, INFA-Guard, a novel defense framework that explicitly identifies and addresses infected agents as a distinct threat category. By leveraging infection-aware detection and topological constraints, INFA-Guard accurately localizes attack sources and infected ranges. During remediation, INFA-Guard replaces attackers and rehabilitates infected ones, avoiding malicious propagation while preserving topological integrity. Extensive experiments demonstrate that INFA-Guard achieves state-of-the-art performance, reducing the Attack Success Rate (ASR) by an average of 33%, while exhibiting cross-model robustness, superior topological generalization, and high cost-effectiveness.

研究动机与目标

  • 在 MAS 安全中将感染代理定义为独立的威胁类别以提升识别能力。
  • 开发感染感知的检测机制,建模动态感染过程。
  • 利用拓扑约束提升对攻击源与感染范围的定位。
  • 提出修复策略,在保持网络拓扑的前提下替换攻击者并 rehabilitate 感染代理。
  • 在多种攻击场景和不同的 LLM 体系上展示前沿防御性能。

提出的方法

  • 将 MAS 建模为带时间序列话语嵌入的动态有向图。
  • 引入感染感知的检测,采用面向转向自适应的 GNN 分支来将代理分为良性、感染和攻击三类(双头输出)。
  • 加入基于拓扑的损失以强化现实的空间约束并降低误报(L_topo)。
  • 应用后适应的拓扑调整和回复级修复以替换攻击者并 rehabilitate 感染代理(G^(k+1)、RF、RP)。
  • 在多种攻击类型(PI、TA、MA)和不同的 LLM 主干(如 Qwen3-235B-A22B、GPT-4o-mini)上进行评估。
  • 给出消融研究,展示时序特征、GNN 分支、感染感知检测、拓扑损失、后适应以及修复组件的影响。
Figure 1: The paradigm comparison between existing MAS safeguards and our infection-aware safeguard.
Figure 1: The paradigm comparison between existing MAS safeguards and our infection-aware safeguard.

实验结果

研究问题

  • RQ1感染代理是否能够作为 MAS 中与初始攻击者区分的独立类别被有效检测?
  • RQ2相较于二元防御,感染感知检测如何提升对攻击源与感染范围的定位?
  • RQ3拓扑约束对检测准确性与修复效果有何影响?
  • RQ4修复(攻击替换与感染修复)在不同攻击场景下对整体现实抗性与传播风险有何影响?

主要发现

  • INFA-Guard 在 PI、TA、MA 任务中表现出比基线更低的攻击成功率(ASR)和更高的防御成功率(MDSR)。
  • 在 PI 任务中,INFA-Guard 的 ASR@3 在 CSQA 仅为 23.3%,在 GSM8K 为 6.7%,优于 Inspector。
  • 在 TA 任务中,INFA-Guard 使 MDSR 在三轮中从 91.3% 提升至 98.3%,达到后期迭代的最优防御。
  • 在 MA 任务中,INFA-Guard 的 ASR@3 为 6.1%,超过 G-safeguard 与 AgentSafe,分别高出约 11% 与 18%。
  • INFA-Guard 在不同 LLM 主干(GPT-4o-mini 与 Qwen3-235B-A22B)以及在链式/树状/星状拓扑下均具有鲁棒性。
  • 该方法在代价效率方面具有优势,相较强基线,骨干 LLM 的提示 token 量下降 35%,完成 token 下降 13%,并在 ASR@3 上实现 66% 的相对下降。
Figure 2: Infected agents significantly increase security risks in MAS. Legends , , represent no defense, defending attack agents, and defending attack and infected agents, respectively.
Figure 2: Infected agents significantly increase security risks in MAS. Legends , , represent no defense, defending attack agents, and defending attack and infected agents, respectively.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。