QUICK REVIEW

[论文解读] Towards Leveraging LLMs to Generate Abstract Penetration Test Cases from Software Architecture

Jafari, Mahdi, Sharma, Rahul|arXiv (Cornell University)|Mar 24, 2026

Information and Cyber Security被引用 0

一句话总结

简述：本文定义了一个 Abstract Penetration Test Case (APTC) 元模型，并研究基于 LLM 的从 PCM 模型生成面向架构的 APTCs 的方法，评估多种提示策略在多种案例中的效果。结果显示高达 93% 的有用性和 86% 的正确性，表明对架构师和测试人员具有实用帮助。

ABSTRACT

Software architecture models capture early design decisions that strongly influence system quality attributes, including security. However, architecture-level security assessment and feedback are often absent in practice, allowing security weaknesses to propagate into later phases of the software development lifecycle and, in some cases, to remain undiscovered, ultimately leading to vulnerable systems. In this paper, we bridge this gap by proposing the generation of Abstract Penetration Test Cases (APTCs) from software architecture models as an input to support architecture-level security assessment. We first introduce a metamodel that defines the APTC concept, and then investigate the use of large language models with different prompting strategies to generate meaningful APTCs from architecture models. To design the APTC metamodel, we analyze relevant standards and state of the art using two criteria: (i) derivability from software architecture, and (ii) usability for both architecture security assessment and subsequent penetration testing. Building on this metamodel, we then proceed to generate APTCs from software architecture models. Our evaluation shows promising results, achieving up to 93\% usefulness and 86\% correctness, indicating that the generated APTCs can substantially support both architects (by highlighting security-critical design decisions) and penetration testers (by providing actionable testing guidance).

研究动机与目标

通过在软件生命周期的早期阶段推动渗透测试，提升对架构级安全评估的动机。
定义一个以架构工件为根基的 Abstract Penetration Test Case (APTC) 的结构化元模型。
评估在从架构模型生成 APTCs 过程中，使用不同提示策略的 LLM 效果。
评估生成的 APTCs 如何帮助架构师和渗透测试人员，并识别需要的架构注释与局限性。

提出的方法

提出描述目标威胁、弱点、攻击向量及受影响的架构要素的 APTC 元模型。
将 PCM 架构序列化为面向安全的文本表示，并通过受限提示强制执行符合模式的 APTC 生成。
使用提示工程（零-shot、一-shot、少量-shot；包含与不包含链式推理）在两个 LLM（GPT 与 Gemini）上生成 APTCs。
通过专家评估和 LLM 辅助专家评估对生成的 APTCs 与 CAWE 弱点进行评估。
对输出结果进行预定义的 JSON 架构模式验证，以确保架构可追踪性和互操作性。

实验结果

研究问题

RQ1RQ1：应该如何定义一个 Abstract Penetration Test Case (APTC)，以支持架构级安全评估？
RQ2RQ2：LLMs 在多大程度上能够分析和理解软件架构的安全含义？
RQ3RQ3：LLMs 在多大程度上能够从软件架构模型生成有意义的 APTCs？

主要发现

LLMs 能够生成与 CAWE 弱点高度相关且具有可观意义的面向架构的 APTCs。
提示策略和模型选择显著影响正确性与有用性，在某些提示下 Gemini 往往在有用性方面超越 GPT。
在三个情景的总体评估中，方法达到最高 93.3% 的有用性和 86.7% 的正确性。
部分输出错误识别弱点或引用不存在的架构要素，反映出语义基础的局限性。
一个结构化的 APTC 元模型实现了可追溯、符合模式的生成，便于集成到安全工作流程中。
评估讨论了有效性威胁并提出覆盖更多 CAWEs 和更丰富威胁模型的扩展建议。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。