QUICK REVIEW

[论文解读] Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG

Xueying Du, Geng Zheng|arXiv (Cornell University)|Jun 17, 2024

Web Application Security Vulnerabilities被引用 16

一句话总结

Vul-RAG 使用知识层面检索增强生成框架，通过从 CVE 构建多维度的漏洞知识库、按功能语义检索相关知识，并进行知识引导的对话式推理来检测漏洞。在 PairVul 上超越基线并提供高质量的解释以辅助人工检测。

ABSTRACT

Although LLMs have shown promising potential in vulnerability detection, this study reveals their limitations in distinguishing between vulnerable and similar-but-benign patched code (only 0.06 - 0.14 accuracy). It shows that LLMs struggle to capture the root causes of vulnerabilities during vulnerability detection. To address this challenge, we propose enhancing LLMs with multi-dimensional vulnerability knowledge distilled from historical vulnerabilities and fixes. We design a novel knowledge-level Retrieval-Augmented Generation framework Vul-RAG, which improves LLMs with an accuracy increase of 16% - 24% in identifying vulnerable and patched code. Additionally, vulnerability knowledge generated by Vul-RAG can further (1) serve as high-quality explanations to improve manual detection accuracy (from 60% to 77%), and (2) detect 10 previously-unknown bugs in the recent Linux kernel release with 6 assigned CVEs.

研究动机与目标

感知超过词汇化代码模式的高级漏洞语义的需求。
用 CVE 实例构建一个漏洞知识库，含三个维度：功能语义、原因和修复方案。
开发一个知识层面的 RAG 流程，以检索并对代码片段的漏洞知识进行推理。
在与前沿基线比较后评估 Vul-RAG 的效果，并评估对自动化与人工漏洞检测的影响。

提出的方法

用 LLM 驱动的提取，将 CVE 转换为三维漏洞知识表示（功能语义、原因、修复方案）。
通过对 CVE 间提取的知识进行抽象和泛化来构建漏洞知识库，包括知识抽象以移除与代码相关的标识符。
在线检索使用代码语义和基于 BM25 的三项查询元素（代码、抽象目的、详细行为）来检索前几条知识项，并通过 Reciprocal Rank Fusion 融合排序。
通过 LLM 对检索到的知识项进行迭代推理，以检测漏洞原因和修复方案，一旦检测到漏洞或遍历完所有项即停止。
提供两步提取原因和修复的 prompts，并以少量示例引导 LLM 进行知识摘要。

Figure 1. A pair of vulnerable code and similar non-vulnerable code (the patched code)

实验结果

研究问题

RQ1Vul-RAG 与在 PairVul 上具有代表性的基于学习的漏洞检测器和静态分析相比，性能如何？
RQ2将漏洞知识融入是否提升自动化漏洞检测（准确率、成对准确率）和人工分析人员的表现？
RQ3知识层面检索对识别脆弱代码对与相似但已修补代码对的影响如何？
RQ4生成的漏洞解释对提升人工检测有多大帮助？

主要发现

Vul-RAG 在 PairVul 的准确性提高了 12.96%、成对准确性提高了 110%，显著超越基线。
带有代码层面 RAG 的 GPT-4 变体在各项指标上都被 Vul-RAG 持续超越。
一项用户研究表明，漏洞知识将人工检测准确率从 0.60 提升到 0.77。
PairVul 基准中包含 4,314 对易受攻击 vs 已修补的对，跨 2,073 个 CVE，划分为 896 个 CVE/训练集和 373 个 CVE/测试集。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。