QUICK REVIEW

[論文レビュー] Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives

Sihao Hu, Tiansheng Huang|arXiv (Cornell University)|Oct 2, 2023

Hate Speech and Cyberbullying Detection被引用数 8

ひとこと要約

GPTLensの提案。生成と識別の二段階で純粋なLLM駆動のフレームワークを用い、スマートコントラクトの脆弱性検出を改善し偽陽性を減らす。CVE報告済みコントラクトに対するワンショット検出と比較して実質的な向上を示す。

ABSTRACT

This paper provides a systematic analysis of the opportunities, challenges, and potential solutions of harnessing Large Language Models (LLMs) such as GPT-4 to dig out vulnerabilities within smart contracts based on our ongoing research. For the task of smart contract vulnerability detection, achieving practical usability hinges on identifying as many true vulnerabilities as possible while minimizing the number of false positives. Nonetheless, our empirical study reveals contradictory yet interesting findings: generating more answers with higher randomness largely boosts the likelihood of producing a correct answer but inevitably leads to a higher number of false positives. To mitigate this tension, we propose an adversarial framework dubbed GPTLens that breaks the conventional one-stage detection into two synergistic stages $-$ generation and discrimination, for progressive detection and refinement, wherein the LLM plays dual roles, i.e., auditor and critic, respectively. The goal of auditor is to yield a broad spectrum of vulnerabilities with the hope of encompassing the correct answer, whereas the goal of critic that evaluates the validity of identified vulnerabilities is to minimize the number of false positives. Experimental results and illustrative examples demonstrate that auditor and critic work together harmoniously to yield pronounced improvements over the conventional one-stage detection. GPTLens is intuitive, strategic, and entirely LLM-driven without relying on specialist expertise in smart contracts, showcasing its methodical generality and potential to detect a broad spectrum of vulnerabilities. Our code is available at: https://github.com/git-disl/GPTLens.

研究の動機と目的

LLMをスマートコントラクトの脆弱性検出に活用する機会と課題を評価する。
LLMベース検出における多様な出力生成と偽陽性のトレードオフを特定する。
検出精度を向上させるために生成と識別を分離するGPTLensを提案する。
実世界のCVE報告コントラクトとベースラインと比較してGPTLensを評価する。
エキスパートのスマートコントラクトツールを必要としないエンドツーエンドのLLM駆動アプローチの汎用性と実用性を強調する。

提案手法

定義済みカテゴリを超える広範な脆弱性記述を可能にするオープンエンドのプロンプティング。
単一LLM上で動作する監査者（生成）と批評家（識別）エージェントによる二段階のGPTLensフレームワーク。
監査者は多様性の高い複数の脆弱性候補と推論を生成する。
批評家は正確性、重大度、収益性を基に候補をランキング・スコアリングし、トップ出力を選択する。
GPT-4バックエンドを用いた13件のCVE関連スマートコントラクトで実験的評価。
複数の構成（A、R、C、O）と監査者数（n）および監査者あたりの出力数（m）の変化による比較。

実験結果

リサーチクエスチョン

RQ1オープンエンドのプロンプティングは predefined カテゴリを超える広範な脆弱性発見を可能にするか。
RQ2生成と識別を分離することで、LLM駆動の脆弱性検出において偽陽性を低減しつつ真陽性を維持できるか。
RQ3監査者の数（n）と監査者あたりの出力数（m）が検出性能に与える影響は何か。
RQ4GPTLensは実際のCVEに対するワンステージ検出ベースラインと比べてどうか。
RQ5アプローチは純粋なLLM駆動で、脆弱性タイプを越えて一般化可能か。

主な発見

手法	Hit # (CVE)	Hit ratio (CVE)	Hit # (trail)	Hit ratio (trail)
A (n=1, m=1)	5	38.5%	13	33.3%
A+R (n=1, m=3)	6	46.2%	7	18.0%
A+C (n=1, m=3)	10	76.9%	18	46.2%
A+O (n=1, m=3)	10	76.9%	25	64.1%
A+C (n=2, m=3)	9	69.2%	23	59.0%
A+O (n=2, m=3)	10	76.9%	29	74.4%

GPTLensはCVE検出のコントラクトレベルヒット率を大幅に向上させ、トップ1ヒットは76.9%で、ワンステージ検出の38.5%を上回る。
試行レベルでは、トップ1ヒット率がGPTLens構成で33.3%から59.0%へ向上。
批評家（A+C）を使用することで、純粋な生成と比較して偽陽性を抑制し精度を著しく改善。
監査者数（n）の増加は試行レベルの性能をさらに改善（例：46.2%から59.0%へ）。
GPTLensは純粋なLLM駆動であり、スマートコントラクト専門知識に依存せず、脆弱性タイプを越えて汎用性を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。