QUICK REVIEW

[论文解读] INFA-Guard: Mitigating Malicious Propagation via Infection-Aware Safeguarding in LLM-Based Multi-Agent Systems

Yijin Zhou, Xiaoya Lu|arXiv (Cornell University)|Jan 21, 2026

Adversarial Robustness in Machine Learning被引用 0

一句话总结

INFA-Guard 引入感染感知的检测和基于拓扑的修复，以在基于LLM的多智能体系统中单独识别攻击者和感染代理，显著降低攻击传播。它替换攻击者并恢复感染代理以保留拓扑结构。

ABSTRACT

The rapid advancement of Large Language Model (LLM)-based Multi-Agent Systems (MAS) has introduced significant security vulnerabilities, where malicious influence can propagate virally through inter-agent communication. Conventional safeguards often rely on a binary paradigm that strictly distinguishes between benign and attack agents, failing to account for infected agents i.e., benign entities converted by attack agents. In this paper, we propose Infection-Aware Guard, INFA-Guard, a novel defense framework that explicitly identifies and addresses infected agents as a distinct threat category. By leveraging infection-aware detection and topological constraints, INFA-Guard accurately localizes attack sources and infected ranges. During remediation, INFA-Guard replaces attackers and rehabilitates infected ones, avoiding malicious propagation while preserving topological integrity. Extensive experiments demonstrate that INFA-Guard achieves state-of-the-art performance, reducing the Attack Success Rate (ASR) by an average of 33%, while exhibiting cross-model robustness, superior topological generalization, and high cost-effectiveness.

研究动机与目标

在 MAS 安全中将感染代理定义为独立的威胁类别以提升识别能力。
开发感染感知的检测机制，建模动态感染过程。
利用拓扑约束提升对攻击源与感染范围的定位。
提出修复策略，在保持网络拓扑的前提下替换攻击者并 rehabilitate 感染代理。
在多种攻击场景和不同的 LLM 体系上展示前沿防御性能。

提出的方法

将 MAS 建模为带时间序列话语嵌入的动态有向图。
引入感染感知的检测，采用面向转向自适应的 GNN 分支来将代理分为良性、感染和攻击三类（双头输出）。
加入基于拓扑的损失以强化现实的空间约束并降低误报（L_topo）。
应用后适应的拓扑调整和回复级修复以替换攻击者并 rehabilitate 感染代理（G^(k+1)、RF、RP）。
在多种攻击类型（PI、TA、MA）和不同的 LLM 主干（如 Qwen3-235B-A22B、GPT-4o-mini）上进行评估。
给出消融研究，展示时序特征、GNN 分支、感染感知检测、拓扑损失、后适应以及修复组件的影响。

Figure 1: The paradigm comparison between existing MAS safeguards and our infection-aware safeguard.

实验结果

研究问题

RQ1感染代理是否能够作为 MAS 中与初始攻击者区分的独立类别被有效检测？
RQ2相较于二元防御，感染感知检测如何提升对攻击源与感染范围的定位？
RQ3拓扑约束对检测准确性与修复效果有何影响？
RQ4修复（攻击替换与感染修复）在不同攻击场景下对整体现实抗性与传播风险有何影响？

主要发现

INFA-Guard 在 PI、TA、MA 任务中表现出比基线更低的攻击成功率（ASR）和更高的防御成功率（MDSR）。
在 PI 任务中，INFA-Guard 的 ASR@3 在 CSQA 仅为 23.3%，在 GSM8K 为 6.7%，优于 Inspector。
在 TA 任务中，INFA-Guard 使 MDSR 在三轮中从 91.3% 提升至 98.3%，达到后期迭代的最优防御。
在 MA 任务中，INFA-Guard 的 ASR@3 为 6.1%，超过 G-safeguard 与 AgentSafe，分别高出约 11% 与 18%。
INFA-Guard 在不同 LLM 主干（GPT-4o-mini 与 Qwen3-235B-A22B）以及在链式/树状/星状拓扑下均具有鲁棒性。
该方法在代价效率方面具有优势，相较强基线，骨干 LLM 的提示 token 量下降 35%，完成 token 下降 13%，并在 ASR@3 上实现 66% 的相对下降。

Figure 2: Infected agents significantly increase security risks in MAS. Legends , , represent no defense, defending attack agents, and defending attack and infected agents, respectively.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。