QUICK REVIEW

[論文レビュー] SHIELD: An Auto-Healing Agentic Defense Framework for LLM Resource Exhaustion Attacks

Nirhoshan Sivaroopan, Kanchana Thilakarathna|arXiv (Cornell University)|Jan 27, 2026

Adversarial Robustness in Machine Learning被引用数 0

ひとこと要約

SHIELD は自己回復型のマルチエージェント防御を導入し、3段階検知器と知識更新およびプロンプト最適化を用いて、未見の variants を含む sponge 攻撃から LLM を守る。

ABSTRACT

Sponge attacks increasingly threaten LLM systems by inducing excessive computation and DoS. Existing defenses either rely on statistical filters that fail on semantically meaningful attacks or use static LLM-based detectors that struggle to adapt as attack strategies evolve. We introduce SHIELD, a multi-agent, auto-healing defense framework centered on a three-stage Defense Agent that integrates semantic similarity retrieval, pattern matching, and LLM-based reasoning. Two auxiliary agents, a Knowledge Updating Agent and a Prompt Optimization Agent, form a closed self-healing loop, when an attack bypasses detection, the system updates an evolving knowledgebase, and refines defense instructions. Extensive experiments show that SHIELD consistently outperforms perplexity-based and standalone LLM defenses, achieving high F1 scores across both non-semantic and semantic sponge attacks, demonstrating the effectiveness of agentic self-healing against evolving resource-exhaustion threats.

研究の動機と目的

リアルワールド展開における resource-exhaustion sponge 攻撃から LLM システムを robust に保護する動機付け。
進化する攻撃戦略に合わせて自己回復防御フレームワークを提案。
検知精度を保ちながら初期段階での遅延を最小化。
自律的な知識更新とプロンプト最適化による継続的防御の洗練を可能にする。

提案手法

Three-stage Defense Agent pipeline: semantic similarity filtering, substring matching with KMP, and LLM-based reasoning for semantic judgment.
Auxiliary Knowledge Updating Agent (KUA) creates and updates a knowledgebase of sponge patterns when attacks bypass detection.
Prompt Optimization Agent (POA) performs evolutionary prompt search to refine defense prompts without retraining the defender LLM.
Closed-loop operation where KUA updates knowledgebase and prompts are refined by POA to improve early-stage detections over time.
Evaluation compares SHIELD against perplexity-filter and harm-filter baselines across non-semantic and semantic sponge attacks.

実験結果

リサーチクエスチョン

RQ1自動回復・エージェント主導のフレームワークは、進化する sponge 攻撃に対する LL M の堅牢性をどのように改善できるか。
RQ23段階防御と知識更新およびプロンプト最適化が検知精度と遅延に与える影響はどの程度か。
RQ3 unseen sponge 攻撃 variante を検出し、各クエリあたりの高コストな LLM 推論への依存を減らせるか。
RQ4継続的な知識ベースの成長とプロンプト最適化が時間とともにエンドツーエンドの性能にどのように影響するか。

主な発見

Attack type	Perplexity-filter	Harm-filter	SHIELD
AUTO-DOS	36.51	87.57	100.00
GCG-DOS	96.07	96.86	99.85
EOGen	95.77	81.34	95.32
RL-GOAL	99.19	93.71	99.60

SHIELD は攻撃タイプ全体で最高の F1 スコアを達成し、ベースラインを最大で 3–14% 上回った。
Stage 3 の LLM ベース推論はコストが高い一方で、初期段階は LLM を呼び出すことなく大半の検知を可能にし、エンドツーエンドの待機時間を大幅に短縮。
プロンプト最適化（POA）は進化する攻撃に対して F1 スコアを約 30%の絶対的改善をもたらす。
知識更新（KUA）は知識ベースの成長に伴い検知を初期段階へシフトさせ、Stage 3 への依存を減らす。
SHIELD は既知および unseen sponge 攻撃の両方を複数のターゲットモデルで robust に検出。
3段階防御と自己回復ループはモデル再訓練なしで検知を維持。

Figure 2: SHIELD overview: (i) multi-agent framework (ii) three-stage defense and (iii) prompt optimizer.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。