QUICK REVIEW

[논문 리뷰] Formal Analysis and Supply Chain Security for Agentic AI Skills

Varun Pratap Bhardwaj|arXiv (Cornell University)|2026. 02. 27.

Adversarial Robustness in Machine Learning인용 수 0

한 줄 요약

SkillFortify는 에이전트 스킬 공급망에 대한 최초의 형식적 분석 프레임워크로, DY-Skill 공격자, 건전한 정적 분석, 능력 기반 샌드박싱, SAT 기반 의존성 해결, 신뢰 점수 대수, 그리고 강력한 실증 결과를 가진 540-스킬 벤치마크를 도입한다.

ABSTRACT

The rapid proliferation of agentic AI skill ecosystems -- exemplified by OpenClaw (228,000 GitHub stars) and Anthropic Agent Skills (75,600 stars) -- has introduced a critical supply chain attack surface. The ClawHavoc campaign (January-February 2026) infiltrated over 1,200 malicious skills into the OpenClaw marketplace, while MalTool catalogued 6,487 malicious tools that evade conventional detection. In response, twelve reactive security tools emerged, yet all rely on heuristic methods that provide no formal guarantees. We present SkillFortify, the first formal analysis framework for agent skill supply chains, with six contributions: (1) the DY-Skill attacker model, a Dolev-Yao adaptation to the five-phase skill lifecycle with a maximality proof; (2) a sound static analysis framework grounded in abstract interpretation; (3) capability-based sandboxing with a confinement proof; (4) an Agent Dependency Graph with SAT-based resolution and lockfile semantics; (5) a trust score algebra with formal monotonicity; and (6) SkillFortifyBench, a 540-skill benchmark. SkillFortify achieves 96.95% F1 (95% CI: [95.1%, 98.4%]) with 100% precision and 0% false positive rate on 540 skills, while SAT-based resolution handles 1,000-node graphs in under 100 ms.

연구 동기 및 목표

공격 증가와 탐지되지 않는 악성 스킬로 인해 에이전트 스킬 공급망에서 형식적 보장 필요성을 제시한다.
에이전트 스킬 안전성에 대한 건전한 분석과 증명을 제공하는 형식적 프레임워크(SkillFortify)를 도입한다.
공격자 모델, 정적 분석, 샌드박싱, SAT 해석으로 의존성 그래프, 신뢰 점수 산정, 벤치마킹 등 구성요소를 개발하고 인정을 증명한다.

제안 방법

다섯 단계 스킬 수명주기에 대한 Dolev–Yao 적응인 DY-Skill 공격자 모델을 정의하고 최대성 증명을 제시한다.
네 원소 능력 격자를 갖는 추상 해석에 기반한 건전한 정적 분석 프레임워크를 개발한다.
구속 증명을 갖춘 능력 기반 샌드박싱을 형식화한다.
에이전트 의존성 그래프를 구성하고 해석을 lockfile 시맨틱으로 SAT 문제로 인코딩한다.
형식적 전파와 단조성을 갖는 신뢰 점수 대수를 도입한다.
탐지 및 해결 성능을 평가하기 위한 540-스킬 벤치마크인 SkillFortifyBench를 만든다.

실험 결과

연구 질문

RQ1공급망 맥락에서 에이전트 스킬 안전성에 대한 형식적 보장을 어떻게 제공할 수 있는가?
RQ2스킬에 의한 불법적 리소스 접근의 부재를 형식적 프레임워크가 증명할 수 있는가?
RQ3대규모 스킬 그래프에서 SAT 기반 의존성 해결의 성능 특성은 무엇인가?
RQ4출처 및 유지 관리를 반영하면서 신뢰 점수가 스킬 의존성을 통해 형식적으로 전파될 수 있는가?
RQ5현실 세계의 악성 및 정상 스킬 벤치마크가 프레임워크의 효과를 검증하는가?

주요 결과

SkillFortify는 SkillFortifyBench에서 96.95% F1 점수와 95% CI [95.1%, 98.4%]를 달성한다.
SkillFortify는 540개 스킬에서 100% 정밀도와 0% 위양을 달성한다.
SAT 기반 해석은 1,000-노드 그래프를 100 ms 이하로 처리한다.
540-스킬 SkillFortifyBench는 실제 캠페인 및 선별된 소스에서 수집된 270개의 악성 스킬과 270개의 정상 스킬로 구성된다.
프레임워크는 건전한 정적 분석, 구속, 그리고 올바른 lockfile 기반 해석을 포함한 형식적 보장을 제공한다.
실증적 평가에서 패턴 매칭과 정보 흐름 분석이 순수한 휴리스틱 방어보다 상호 보완적임을 입증한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.