QUICK REVIEW

[論文レビュー] (Security) Assertions by Large Language Models

Rahul Kande, Hammond Pearce|arXiv (Cornell University)|Jun 24, 2023

Security and Verification in Computing参考文献 42被引用数 20

ひとこと要約

本論文は、初期設定のままの大規模言語モデルを用いて、ハードウェアセキュリティの主張を自動的に生成することを評価し、ベンチマークスイートとプロンプトベースのプロンプトを用いて評価を行い、評価用のオープンソースフレームワークを公開する。

ABSTRACT

The security of computer systems typically relies on a hardware root of trust. As vulnerabilities in hardware can have severe implications on a system, there is a need for techniques to support security verification activities. Assertion-based verification is a popular verification technique that involves capturing design intent in a set of assertions that can be used in formal verification or testing-based checking. However, writing security-centric assertions is a challenging task. In this work, we investigate the use of emerging large language models (LLMs) for code generation in hardware assertion generation for security, where primarily natural language prompts, such as those one would see as code comments in assertion files, are used to produce SystemVerilog assertions. We focus our attention on a popular LLM and characterize its ability to write assertions out of the box, given varying levels of detail in the prompt. We design an evaluation framework that generates a variety of prompts, and we create a benchmark suite comprising real-world hardware designs and corresponding golden reference assertions that we want to generate with the LLM.

研究の動機と目的

設計意図と脆弱性検査を捉える主張がハードウェアセキュリティ検証を促進する動機付け。
初期設定のままのLLMが自然言語のプロンプトからSystemVerilogセキュリティ主張を生成できるかを評価する。
LLMの性能を測定するためのベンチマーク、プロンプト、検証パイプラインを備えた評価フレームワークを開発する。
LLM支援によるハードウェア主張生成のさらなる研究を支援するため、フレームワークとベンチマークをオープンソース化する。

提案手法

実世界のハードウェア設計のベンチマークスイートを、対応するゴールドン参照主張とともに設計する。
設計コンテキスト、例示主張、およびコメントの詳細度を組み合わせたプロンプト生成器を作成する。
一般的なLLMのミスを修正し、検証のために処理済みの主張を準備する主張ファイル生成器を実装する。
シミュレータ(Modelsim)を用いて、違反マッチングに基づきLLM生成主張とゴールデン参照を比較する。
違反を引き起こす入力と照合して、処理済み主張を正誤として分類するスコアボードで正確さを評価する。

Figure 1: Evaluation framework for our assertion generator.

実験結果

リサーチクエスチョン

RQ1RQ1: 大規模言語モデルは初期設定のままでハードウェアセキュリティ主張を生成できるか？
RQ2RQ2: 異なるタイプのプロンプトは生成される主張の品質と正確さにどう影響するか？

主な発見

LLM駆動のパイプラインは、さまざまなベンチマークにわたってハードウェアセキュリティ主張を生成できる。
プロンプト設計（設計コンテキスト、コメントの詳細、例、および先頭部分）は主張の品質に大きく影響する。
固定された一連の自動修正により、一般的な文法/タイプミスの問題を修正でき、性能を損なわない。
処理と重複排除によりかなりの数の有用な主張が得られ、その中には検証時にゴールデン参照と一致するものもある。

Figure 5: Template for the prompt string.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。