QUICK REVIEW

[論文レビュー] Towards Leveraging LLMs to Generate Abstract Penetration Test Cases from Software Architecture

Jafari, Mahdi, Sharma, Rahul|arXiv (Cornell University)|Mar 24, 2026

Information and Cyber Security被引用数 0

ひとこと要約

本論文は Abstract Penetration Test Case (APTC) のメタモデルを定義し、PCM モデルからアーキテクチャ-grounded な APTC の生成を LLM で検討し、 prompting 戦略を複数のケーススタディで評価します。結果は最大 93% の有用性と 86% の正確性を示し、アーキテクトとテスターへの実務的な支援を示唆します。

ABSTRACT

Software architecture models capture early design decisions that strongly influence system quality attributes, including security. However, architecture-level security assessment and feedback are often absent in practice, allowing security weaknesses to propagate into later phases of the software development lifecycle and, in some cases, to remain undiscovered, ultimately leading to vulnerable systems. In this paper, we bridge this gap by proposing the generation of Abstract Penetration Test Cases (APTCs) from software architecture models as an input to support architecture-level security assessment. We first introduce a metamodel that defines the APTC concept, and then investigate the use of large language models with different prompting strategies to generate meaningful APTCs from architecture models. To design the APTC metamodel, we analyze relevant standards and state of the art using two criteria: (i) derivability from software architecture, and (ii) usability for both architecture security assessment and subsequent penetration testing. Building on this metamodel, we then proceed to generate APTCs from software architecture models. Our evaluation shows promising results, achieving up to 93\% usefulness and 86\% correctness, indicating that the generated APTCs can substantially support both architects (by highlighting security-critical design decisions) and penetration testers (by providing actionable testing guidance).

研究の動機と目的

ソフトウェアライフサイクルの早期段階でのセキュリティ評価を推進するためのアーキテクチャレベルの動機づけ。
アーキテクチャアーティファクトに根ざした Abstract Penetration Test Case (APTC) の構造化メタモデルを定義。
アーキテクチャモデルから APTC を生成する際のさまざまな prompting 戦略を用いた LLM の有効性を評価。
生成された APTC がアーキテクトとペネトレーションテスターにどのように役立つかを評価し、制限と必要なアーキテクチャ注釈を特定。

提案手法

ターゲットとなる脅威・弱点・攻撃ベクトル・影響を受けるアーキテクチャ要素を記述する APTC メタモデルを提案。
PCM アーキテクチャをセキュリティ指向のテキスト表現として直列化し、制約付き prompting を用いてスキーマ適合の APTC 生成を強制。
ゼロショット・ワンショット・Few-shot の prompting 工学を用い、チェーン・オブ・ソートあり／なしで APTC を生成。2つの LLM（GPT と Gemini）を使用。
専門家ベースおよび LLM 支援専門家評価を通じて CAWE 弱点に対する生成 APTC を評価。
アーキテクチャのトレース可能性と互換性を確保する predefined JSON スキーマに対して出力を検証。

実験結果

リサーチクエスチョン

RQ1RQ1: アーキテクチャレベルのセキュリティ評価を支援するために、Abstract Penetration Test Case (APTC) をどのように定義すべきか？
RQ2RQ2: LLM はソフトウェアアーキテクチャのセキュリティ影響をどの程度分析・理解できるか？
RQ3RQ3: LLM はソフトウェアアーキテクチャモデルから意味のある APTC をどの程度生成できるか？

主な発見

Model	Metric	Maintenance	PowerGrid	Bank	Total/15	Success Rate
GPT-5.2	Correctness	2/5	3/5	4/5	9/15	60.0%
GPT-5.2	Usefulness	5/5	4/5	4/5	13/15	86.7%
Gemini-3-Pro	Correctness	4/5	2/5	4/5	10/15	66.7%
Gemini-3-Pro	Usefulness	3/5	2/5	5/5	10/15	66.7%
GPT-5.2	Correctness	4/5	3/5	4/5	11/15	73.3%
GPT-5.2	Usefulness	4/5	3/5	4/5	11/15	73.3%
Gemini-3-Pro	Correctness	5/5	4/5	4/5	13/15	86.7%
Gemini-3-Pro	Usefulness	5/5	5/5	4/5	14/15	93.3%
GPT-5.2	Correctness	4/5	4/5	3/5	11/15	73.3%
GPT-5.2	Usefulness	4/5	4/5	4/5	12/15	80.0%
Gemini-3-Pro	Correctness	2/5	2/5	2/5	6/15	40.0%
Gemini-3-Pro	Usefulness	2/5	5/5	3/5	10/15	66.7%

LLMs は CAWE 弱点と整合性が取れた、アーキテクチャに基づく APTC を生成できる。
prompting 戦略とモデルの選択は正確さと有用性に大きく影響し、特定の prompts 下では Gemini が GPT より有用性で上回ることが多い。
本手法は 3 ケースで総合的に最大 93.3% の有用性と 86.7% の正確性を達成。
一部の出力が弱点を誤って特定したり、存在しないアーキテクチャ要素を参照する等、意味的な grounded の限界を示す。
構造化された APTC メタモデルにより、セキュリティワークフローへ統合可能なトレース可能・スキーマ適合生成を実現。
評価は妥当性の脅威を論じ、より多くの CAWEs とより豊かな脅威モデルをカバーする拡張を示唆。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。