QUICK REVIEW

[論文レビュー] SecureFalcon: Are We There Yet in Automated Software Vulnerability Detection with LLMs?

Mohamed Amine Ferrag, Ammar Battah|arXiv (Cornell University)|Jul 13, 2023

Software Engineering Research被引用数 10

ひとこと要約

SecureFalcon は FalconLLM 7b を微調整して C コードの脆弱性を FormAI データセットを用いて検出し、脆弱性検出の精度を 94% に達成し、修復提案を可能にする。

ABSTRACT

Software vulnerabilities can cause numerous problems, including crashes, data loss, and security breaches. These issues greatly compromise quality and can negatively impact the market adoption of software applications and systems. Traditional bug-fixing methods, such as static analysis, often produce false positives. While bounded model checking, a form of Formal Verification (FV), can provide more accurate outcomes compared to static analyzers, it demands substantial resources and significantly hinders developer productivity. Can Machine Learning (ML) achieve accuracy comparable to FV methods and be used in popular instant code completion frameworks in near real-time? In this paper, we introduce SecureFalcon, an innovative model architecture with only 121 million parameters derived from the Falcon-40B model and explicitly tailored for classifying software vulnerabilities. To achieve the best performance, we trained our model using two datasets, namely the FormAI dataset and the FalconVulnDB. The FalconVulnDB is a combination of recent public datasets, namely the SySeVR framework, Draper VDISC, Bigvul, Diversevul, SARD Juliet, and ReVeal datasets. These datasets contain the top 25 most dangerous software weaknesses, such as CWE-119, CWE-120, CWE-476, CWE-122, CWE-190, CWE-121, CWE-78, CWE-787, CWE-20, and CWE-762. SecureFalcon achieves 94% accuracy in binary classification and up to 92% in multiclassification, with instant CPU inference times. It outperforms existing models such as BERT, RoBERTa, CodeBERT, and traditional ML algorithms, promising to push the boundaries of software vulnerability detection and instant code completion frameworks.

研究の動機と目的

LLMsを用いたソフトウェア脆弱性検出の改善を動機づける。
C コードサンプルのバイナリ脆弱性分類のために FalconLLM を微調整する。
FormAI データセットを作成・活用して脆弱性検出性能を評価する。
修復指向モデルへの prompting を通じて脆弱性修復機能を提供する。
SecureFalcon の構成バリアントと実用的なデプロイ考慮を評価する。

提案手法

FormAI由来の C コードに対して 42 CWE ラベルで FalconLLM 7b を微調整する。
ヘッダノイズ、HTML、電子メールアドレスを除去してデータを前処理し、ラベルを数値化してエンコードする。
768次元デコーダ出力を 2 クラスに写像するシグモイドを持つ 2 ラベル脆弱性スコアリングヘッドを実装する。
自己注意に Rotary Position Embedding (RoPE) を使用し、デコーダ層に GELU 活性化を持つMLPを用いる。
AdamW オプティマイザ、LR スケジュール（2e-2 と 2e-5）、早期停止、クロスエントロピー損失で訓練する。
2 つの設定（121M および 44M パラメータ）を評価し、Falcon-40B-Instruct プロンプトを介して脆弱性修復を実証する。）

Figure 1 : Top 10 most frequent vulnerabilities categories in FormAI dataset.

実験結果

リサーチクエスチョン

RQ1FalconLLM は脆弱性のある C コードとそうでないコードを効果的に識別するよう微調整できるか？
RQ2モデルサイズ（121M 対 44M）が脆弱性検出性能に与える影響は？
RQ3FormAI ベースの脆弱性分布を 42 CWE カテゴリに渡って SecureFalcon はどの程度実績を示すか？
RQ4検出された脆弱性に対して修復ステップを提案するシステムへ拡張できるか？

主な発見

SecureFalcon は FormAI由来データでの脆弱性検出に高い精度を達成（特定の指標: 各設定のトレーニングエポックで報告された精度）。
2 つの設定（121M と 44M）で、異なる学習率でエポック間に精度が段階的に向上（LR=2e-5 および LR=2e-2）。
SecureFalcon 121M, LR=2e-5 では、エポック7で訓練精度が 0.97、検証精度が 0.94 に達する。
FormAI データセットは 112,000 の C プログラムを含み、42 CWE カテゴリと 197,800 のラベル付き脆弱性を含む。
本研究は脆弱性検出後の修正案を提案する修復システムとして FalconLLM を用いることも示している。

Figure 2 : SecureFalcon model architecture.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。