[论文解读] Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives
提出了 GPTLens,一种两阶段、仅由大型语言模型驱动的框架(生成与批判)来改进智能合约漏洞检测并减少误报,在 CVE 报告合约上的单次检测相比取得显著提升。
This paper provides a systematic analysis of the opportunities, challenges, and potential solutions of harnessing Large Language Models (LLMs) such as GPT-4 to dig out vulnerabilities within smart contracts based on our ongoing research. For the task of smart contract vulnerability detection, achieving practical usability hinges on identifying as many true vulnerabilities as possible while minimizing the number of false positives. Nonetheless, our empirical study reveals contradictory yet interesting findings: generating more answers with higher randomness largely boosts the likelihood of producing a correct answer but inevitably leads to a higher number of false positives. To mitigate this tension, we propose an adversarial framework dubbed GPTLens that breaks the conventional one-stage detection into two synergistic stages $-$ generation and discrimination, for progressive detection and refinement, wherein the LLM plays dual roles, i.e., auditor and critic, respectively. The goal of auditor is to yield a broad spectrum of vulnerabilities with the hope of encompassing the correct answer, whereas the goal of critic that evaluates the validity of identified vulnerabilities is to minimize the number of false positives. Experimental results and illustrative examples demonstrate that auditor and critic work together harmoniously to yield pronounced improvements over the conventional one-stage detection. GPTLens is intuitive, strategic, and entirely LLM-driven without relying on specialist expertise in smart contracts, showcasing its methodical generality and potential to detect a broad spectrum of vulnerabilities. Our code is available at: https://github.com/git-disl/GPTLens.
研究动机与目标
- 评估在智能合约漏洞检测中使用LLM的机会与挑战。
- 识别在LLM驱动的检测中生成多样化输出与误报之间的权衡。
- 提出 GPTLens 将生成与判别分离以提升检测准确性。
- 在实际的 CVE 报告合约和基线方法上评估 GPTLens。
- 强调端到端的LLM驱动方法的普适性和实用性,而无需专家的智能合约工具。
提出的方法
- 开放式提示以实现超越预定义类别的广泛漏洞描述。
- 两阶段的 GPTLens 框架,在同一LLM上运行审计员(生成)与批评者(判别)代理。
- 审计员产出多种高多样性的漏洞候选并给出推理。
- 批评者对候选项在正确性、严重性和盈利性上进行排序评分,以选出最佳输出。
- 在13个与 CVE 相关的智能合约上使用 GPT-4 后端进行实验评估。
- 在若干配置(A、R、C、O)以及不同数量的审计员(n)和每个审计员输出数量(m)之间进行比较。
实验结果
研究问题
- RQ1开放式提示是否能够在预定义类别之外实现广泛的漏洞发现?
- RQ2将生成与判别分开是否能够在LLM驱动的漏洞检测中减少误报,同时保留真实阳性?
- RQ3审计员数量(n)与每个审计员输出数量(m)对检测性能有何影响?
- RQ4与单阶段检测基线相比,GPTLens在真实 CVE 上的表现如何?
- RQ5该方法是否纯粹由LLM驱动且在漏洞类型上具有普适性?
主要发现
| Method | Hit # (CVE) | Hit ratio (CVE) | Hit # (trail) | Hit ratio (trail) |
|---|---|---|---|---|
| A (n=1, m=1) | 5 | 38.5% | 13 | 33.3% |
| A+R (n=1, m=3) | 6 | 46.2% | 7 | 18.0% |
| A+C (n=1, m=3) | 10 | 76.9% | 18 | 46.2% |
| A+O (n=1, m=3) | 10 | 76.9% | 25 | 64.1% |
| A+C (n=2, m=3) | 9 | 69.2% | 23 | 59.0% |
| A+O (n=2, m=3) | 10 | 76.9% | 29 | 74.4% |
- GPTLens 在 CVE 检测的合约层命中率显著提高(Top-1 命中率 76.9%),相较于单阶段检测的 38.5%。
- 在试验层面,Top-1 命中率从 33.3% 提升至 59.0%,适用于 GPTLens 配置。
- 使用批评者(A+C)通过筛除误报显著提升精确度,相较于纯生成。
- 增加审计员数量(n)进一步提升试验层性能(如从 46.2% 提升至 59.0%)。
- GPTLens 是纯LLM驱动,不依赖智能合约专家知识,展示了对漏洞类型的普适性。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。