QUICK REVIEW

[论文解读] Formal Analysis and Supply Chain Security for Agentic AI Skills

Varun Pratap Bhardwaj|arXiv (Cornell University)|Feb 27, 2026

Adversarial Robustness in Machine Learning被引用 0

一句话总结

SkillFortify 是首个代理技能供应链的形式化分析框架，引入 DY-Skill 攻击者、健全的静态分析、基于能力的沙箱、SAT 基础的依赖解析、信任分数代数，以及包含 540 项技能基准测试的强大经验结果。

ABSTRACT

The rapid proliferation of agentic AI skill ecosystems -- exemplified by OpenClaw (228,000 GitHub stars) and Anthropic Agent Skills (75,600 stars) -- has introduced a critical supply chain attack surface. The ClawHavoc campaign (January-February 2026) infiltrated over 1,200 malicious skills into the OpenClaw marketplace, while MalTool catalogued 6,487 malicious tools that evade conventional detection. In response, twelve reactive security tools emerged, yet all rely on heuristic methods that provide no formal guarantees. We present SkillFortify, the first formal analysis framework for agent skill supply chains, with six contributions: (1) the DY-Skill attacker model, a Dolev-Yao adaptation to the five-phase skill lifecycle with a maximality proof; (2) a sound static analysis framework grounded in abstract interpretation; (3) capability-based sandboxing with a confinement proof; (4) an Agent Dependency Graph with SAT-based resolution and lockfile semantics; (5) a trust score algebra with formal monotonicity; and (6) SkillFortifyBench, a 540-skill benchmark. SkillFortify achieves 96.95% F1 (95% CI: [95.1%, 98.4%]) with 100% precision and 0% false positive rate on 540 skills, while SAT-based resolution handles 1,000-node graphs in under 100 ms.

研究动机与目标

由于攻击上升与潜在的恶意技能未被检测，需对代理技能供应链提供形式化保证的动机。
引入提供技能安全性健全分析与证明的形式化框架（SkillFortify）。
开发并证明组件：攻击者模型、静态分析、沙箱、带 SAT 解析的依赖图、信任评分，以及基准评测。

提出的方法

定义 DY-Skill 攻击者模型，这是对五阶段技能生命周期的 Dolev–Yao 改编，并给出最大性证明。
基于具有四元素能力格的抽象解释，开发健全的静态分析框架。
将基于能力的沙箱化形式化为封闭性证明。
构建 Agent Dependency Graph，并将解析编码为带锁文件语义的 SAT 问题。
引入带有形式传播性与单调性的信任分数代数。
创建 SkillFortifyBench，包含 540 项技能的基准测试，用以评估检测与解析性能。

实验结果

研究问题

RQ1如何在供应链背景下为代理技能安全提供形式化保证？
RQ2一个形式化框架能否证明技能对未授权资源的访问不存在？
RQ3对于大规模技能图，基于 SAT 的依赖解析的性能特性如何？
RQ4信任分数在技能依赖中如何在保留来源与维护信息的前提下形式化传播？
RQ5现实世界的恶意与良性技能基准是否能验证框架的有效性？

主要发现

SkillFortify 在 SkillFortifyBench 上达到 96.95% 的 F1 分数，95% 置信区间为 [95.1%，98.4%]。
SkillFortify 在 540 项技能上达到 100% 精确度且 0% 假阳性。
基于 SAT 的解析在小于 100 ms 的时间内处理 1,000 节点图。
540 项技能的 SkillFortifyBench 包含 270 项恶意技能与 270 项良性技能，均来自真实活动与整理来源。
该框架提供形式化保证，包括健全的静态分析、封闭性以及基于锁文件的正确解析。
经验评估表明模式匹配与信息流分析与纯启发式防御具有互补性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。