QUICK REVIEW

[논문 리뷰] Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives

Sihao Hu, Tiansheng Huang|arXiv (Cornell University)|2023. 10. 02.

Hate Speech and Cyberbullying Detection인용 수 8

한 줄 요약

GPTLens를 제안하는 두 단계의 순수 LLM 구동 프레임워크(생성 및 비판)로, 스마트 계약 취약점 탐지를 개선하고 CVE 보고된 계약에서 일회성 탐지 대비 거짓 양성을 줄이며 상당한 이득을 보임.

ABSTRACT

This paper provides a systematic analysis of the opportunities, challenges, and potential solutions of harnessing Large Language Models (LLMs) such as GPT-4 to dig out vulnerabilities within smart contracts based on our ongoing research. For the task of smart contract vulnerability detection, achieving practical usability hinges on identifying as many true vulnerabilities as possible while minimizing the number of false positives. Nonetheless, our empirical study reveals contradictory yet interesting findings: generating more answers with higher randomness largely boosts the likelihood of producing a correct answer but inevitably leads to a higher number of false positives. To mitigate this tension, we propose an adversarial framework dubbed GPTLens that breaks the conventional one-stage detection into two synergistic stages $-$ generation and discrimination, for progressive detection and refinement, wherein the LLM plays dual roles, i.e., auditor and critic, respectively. The goal of auditor is to yield a broad spectrum of vulnerabilities with the hope of encompassing the correct answer, whereas the goal of critic that evaluates the validity of identified vulnerabilities is to minimize the number of false positives. Experimental results and illustrative examples demonstrate that auditor and critic work together harmoniously to yield pronounced improvements over the conventional one-stage detection. GPTLens is intuitive, strategic, and entirely LLM-driven without relying on specialist expertise in smart contracts, showcasing its methodical generality and potential to detect a broad spectrum of vulnerabilities. Our code is available at: https://github.com/git-disl/GPTLens.

연구 동기 및 목표

LLMs를 스마트 계약 취약점 탐지에 사용하는 기회와 도전 과제를 평가합니다.
LLM 기반 탐지에서 다양한 출력을 생성하는 것과 거짓 양성 간의 트레이드오프를 식별합니다.
탐지 정확도를 높이기 위해 생성과 비판을 분리하는 GPTLens를 제안합니다.
실제 CVE 보고 계약 및 기준선과 비교하여 GPTLens를 평가합니다.
전문가 스마트 계약 도구 없이 엔드투엔드 LLM 구동 접근 방식의 일반성과 실용성을 강조합니다.

제안 방법

사전에 정의된 카테고리를 넘어서는 취약점 설명의 폭을 넓히기 위한 개방형 프롬퓨팅(Open-ended prompting).
단일 LLM에서 작동하는 감사자(생성)와 비평가(비판) 에이전트를 갖춘 두 단계의 GPTLens 프레임워크.
감사자들이 다양한 취약점 후보를 이유와 함께 다수 생성.
비판자는 정확성, 심각도 및 수익성으로 후보를 순위 매기고 점수를 매겨 상위 결과를 선택.
GPT-4 백엔드를 사용한 13건의 CVE 연계 스마트 계약에 대한 실험적 평가.
구성(A, R, C, O) 간 비교 및 감사자 수(n) 및 감사자당 산출물 수(m)의 변화에 따른 실험 설정.

실험 결과

연구 질문

RQ1개방형 프롬프트가 predefined 카테고리를 넘어 광범위한 취약점 발견을 가능하게 할 수 있는가?
RQ2생성과 비판을 분리하는 것이 LLM 기반 취약점 탐지에서 거짓 양성을 줄이면서 참 양성을 유지하는가?
RQ3감사자 수(n)와 감사자당 산출물 수(m)가 탐지 성능에 어떤 영향을 미치는가?
RQ4실제 CVE에서 GPTLens는 일단계 탐지 기준선에 비해 어떤 성능을 보이는가?
RQ5이 접근 방식이 순수한 LLM 구동이며 취약점 유형에 걸쳐 일반화 가능한가?

주요 결과

방법	Hit # (CVE)	Hit 비율 (CVE)	Hit # (실험)	Hit 비율 (실험)
A (n=1, m=1)	5	38.5%	13	33.3%
A+R (n=1, m=3)	6	46.2%	7	18.0%
A+C (n=1, m=3)	10	76.9%	18	46.2%
A+O (n=1, m=3)	10	76.9%	25	64.1%
A+C (n=2, m=3)	9	69.2%	23	59.0%
A+O (n=2, m=3)	10	76.9%	29	74.4%

GPTLens는 CVE 탐지에서 계약 수준의 적중률을 크게 높여 top-1 적중률이 76.9%로, 일단계 탐지의 38.5%에 비해 크게 향상되었습니다.
실험 수준에서 top-1 적중률이 GPTLens 구성으로 33.3%에서 59.0%로 상승했습니다.
비판자(A+C) 를 사용하는 것이 순수 생성에 비해 거짓 양성을 필터링하여 정밀도를 현저히 향상시킵니다.
감사자 수(n)를 늘리면 실험 수준의 성능이 더 향상됩니다(예: 46.2%에서 59.0%로).
GPTLens는 순수 LLM 구동이며 스마트 계약 전문가 지식에 의존하지 않으므로 취약점 유형에 걸쳐 일반성을 보입니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.