QUICK REVIEW

[论文解读] To Defend Against Cyber Attacks, We Must Teach AI Agents to Hack

Terry Yue Zhuo, Yangruibo Ding|arXiv (Cornell University)|Feb 1, 2026

Adversarial Robustness in Machine Learning被引用 0

一句话总结

本文主张进攻性人工智能安全能力的必然性，必须予以开发与治理以提升防御，提出基准、训练代理及在可审计的网络环境中受控部署。

ABSTRACT

For over a decade, cybersecurity has relied on human labor scarcity to limit attackers to high-value targets manually or generic automated attacks at scale. Building sophisticated exploits requires deep expertise and manual effort, leading defenders to assume adversaries cannot afford tailored attacks at scale. AI agents break this balance by automating vulnerability discovery and exploitation across thousands of targets, needing only small success rates to remain profitable. Current developers focus on preventing misuse through data filtering, safety alignment, and output guardrails. Such protections fail against adversaries who control open-weight models, bypass safety controls, or develop offensive capabilities independently. We argue that AI-agent-driven cyber attacks are inevitable, requiring a fundamental shift in defensive strategy. In this position paper, we identify why existing defenses cannot stop adaptive adversaries and demonstrate that defenders must develop offensive security intelligence. We propose three actions for building frontier offensive AI capabilities responsibly. First, construct comprehensive benchmarks covering the full attack lifecycle. Second, advance from workflow-based to trained agents for discovering in-wild vulnerabilities at scale. Third, implement governance restricting offensive agents to audited cyber ranges, staging release by capability tier, and distilling findings into safe defensive-only agents. We strongly recommend treating offensive AI capabilities as essential defensive infrastructure, as containing cybersecurity risks requires mastering them in controlled settings before adversaries do.

研究动机与目标

推动网络防御向由自主AI代理驱动的进攻性安全情报转变。
强调AI代理如何在大规模对多目标上自动化发现与利用漏洞。
提出一个框架，用于基准测试、开发与安全部署进攻性AI能力，作为关键的防御基础设施。

提出的方法

形式化一个威胁模型：金钱驱动的对手使用SOTA AI代理在大规模上自动化攻击。
分析现有防御性保障的局限性，如数据治理、安全对齐、表示工程与防护边界。
提出一个三叉框架用于前沿进攻安全：包括全面的攻击全生命周期基准、从工作流到训练代理的演进，以及通过受审计的网络演练场景实现治理。

Figure 1 : Matching AI Attack Scale Requires Autonomous Offensive Security Capabilities. Left : AI agents enable economically viable attacks through parallelization. Right : Both AI agents and humans can perform offensive or defensive operations, but only offensive AI agents can match the predictabi

实验结果

研究问题

RQ1自治AI代理在大规模网络安全中带来哪些风险？
RQ2如何发展并治理进攻性AI能力，以有利于防御而非助长误用？
RQ3实现安全、防御性进攻性AI在网络安全中的所需基准与开发阶段是什么？

主要发现

进攻性AI能力可以降低攻击的边际成本，使对长期尾部目标的可扩展利用成为可能。
当前的防御性保障对自适应的、具代理性的攻击者很脆弱，可能被开放权重或自托管模型绕过。
需要覆盖完整攻击生命周期和动态环境的前沿进攻安全基准。
分阶段发布的治理模型可以将进攻能力限制在受审计的网络演练场景内，并将发现归纳为仅用于防御的产物。
进攻性安全情报可以通过揭示漏洞并促成快速修复来加速防御。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。