QUICK REVIEW

[논문 리뷰] INFA-Guard: Mitigating Malicious Propagation via Infection-Aware Safeguarding in LLM-Based Multi-Agent Systems

Yijin Zhou, Xiaoya Lu|arXiv (Cornell University)|2026. 01. 21.

Adversarial Robustness in Machine Learning인용 수 0

한 줄 요약

INFA-Guard는 LLM 기반 MAS에서 공격과 감염된 에이전트를 각각 식별하기 위한 감염 인지 탐지 및 토폴로지 기반 교정을 도입하여 공격 전파를 크게 감소시킵니다. 이는 공격자를 교체하고 감염된 에이전트를 회복시켜 토폴로지를 보존합니다.

ABSTRACT

The rapid advancement of Large Language Model (LLM)-based Multi-Agent Systems (MAS) has introduced significant security vulnerabilities, where malicious influence can propagate virally through inter-agent communication. Conventional safeguards often rely on a binary paradigm that strictly distinguishes between benign and attack agents, failing to account for infected agents i.e., benign entities converted by attack agents. In this paper, we propose Infection-Aware Guard, INFA-Guard, a novel defense framework that explicitly identifies and addresses infected agents as a distinct threat category. By leveraging infection-aware detection and topological constraints, INFA-Guard accurately localizes attack sources and infected ranges. During remediation, INFA-Guard replaces attackers and rehabilitates infected ones, avoiding malicious propagation while preserving topological integrity. Extensive experiments demonstrate that INFA-Guard achieves state-of-the-art performance, reducing the Attack Success Rate (ASR) by an average of 33%, while exhibiting cross-model robustness, superior topological generalization, and high cost-effectiveness.

연구 동기 및 목표

MAS 보안에서 감염된 에이전트를 독립된 위협 범주로 동기 부여하고 정의합니다.
동적 감염 프로세스를 모델링하는 감염 인지 탐지 메커니즘을 개발합니다.
토폴로지 제약을 활용하여 공격 원천 및 감염 범위의 위치 추정 성능을 향상시킵니다.
네트워크 토폴로지를 보존하면서 공격자를 교체하고 감염된 에이전트를 회복시키는 교정 전략을 제안합니다.
다수의 공격 시나리오와 LLM 백본에서 최첨단 방어 성능을 입증합니다.

제안 방법

MAS를 시계열 발화 임베딩을 갖는 동적 방향 그래프로 모델링합니다.
턴 적응 GNN 분기를 갖춘 감염 인지 탐지를 도입하여 정상/감염/공격 에이전트를 분류합니다(듀얼-헤드 출력).
현실적인 공간 제약을 강제하고 거짓 양성(false positives)을 줄이기 위한 토폴로지 기반 손실(L_topo)을 포함합니다.
사후 적응 토폴로지 조정 및 응답 수준 교정을 적용하여 공격자를 교체하고 감염된 에이전트를 회복시키며(G^(k+1), RF, RP).
다양한 공격 유형(PI, TA, MA)과 LLM 백본(Qwen3-235B-A22B, GPT-4o-mini 등)을 대상으로 평가합니다.
Temporal Features, GNN 분기, 감염 인지 탐지, 토폴로지 손실, 사후 적응 및 교정 구성요소의 영향력을 보여주는 어블레이션 연구를 제공합니다.

Figure 1: The paradigm comparison between existing MAS safeguards and our infection-aware safeguard.

실험 결과

연구 질문

RQ1MAS에서 감염된 에이전트를 초기 공격자로부터 별개의 클래스로 효과적으로 탐지할 수 있을까요?
RQ2감염 인지 탐지가 이진 방어에 비해 공격 원천 및 감염 범위의 위치 추정에 어떤 개선을 가져오나요?
RQ3토폴로지 제약이 탐지 정확도와 교정 효율성에 미치는 영향은 무엇인가요?
RQ4교정(공격 대체 및 감염 회복)이 다양한 공격 시나리오에서 전체 MAS의 강인성과 전파 위험에 어떤 영향을 미치나요?

주요 결과

INFA-Guard는 PI, TA, MA 작업에서 기본값 대비 ASR은 더 낮고 방어 성공률(MDSR)은 더 높습니다.
PI 작업에서 INFA-Guard는 CSQA에서 ASR@3를 23.3%까지 낮추고 GSM8K에서 6.7%를 달성하여 Inspector를 능가합니다.
TA 작업에서 INFA-Guard는 3번 턴에 걸쳐 MDSR을 91.3%에서 98.3%로 회복하여 후반 반복에서 최적의 방어를 달성합니다.
MA 작업에서 INFA-Guard는 ASR@3 6.1%를 달성하여 G-safeguard 및 AgentSafe를 각각 11%포인트 및 18%포인트 이상 능가합니다.
INFA-Guard는 LLM 백본(GPT-4o-mini 및 Qwen3-235B-A22B)과 체인/트리/스타 토폴로지 전반에서 강인성을 유지합니다.
본 접근법은 강력한 baselines 대비 Backbone LLM 프롬프트 토큰이 35% 감소하고 완료 토큰이 13% 감소하는 등 토큰 비용 측에서 우수한 효율성을 제공하며, ASR@3의 상대적 감소율은 66%를 달성합니다.

Figure 2: Infected agents significantly increase security risks in MAS. Legends , , represent no defense, defending attack agents, and defending attack and infected agents, respectively.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.