Skip to main content
QUICK REVIEW

[论文解读] LLM-Powered Code Vulnerability Repair with Reinforcement Learning and Semantic Reward

Nafis Tanveer Islam, Joseph Khoury|arXiv (Cornell University)|Jan 7, 2024
Software Engineering Research被引用 5
一句话总结

SecRepair 使用 CodeGen2-7B,结合强化学习和语义奖励,自动识别、修复、描述漏洞,并生成简洁的代码注释,同时提供基于指令的漏洞数据集 InstructVul。

ABSTRACT

In software development, the predominant emphasis on functionality often supersedes security concerns, a trend gaining momentum with AI-driven automation tools like GitHub Copilot. These tools significantly improve developers' efficiency in functional code development. Nevertheless, it remains a notable concern that such tools are also responsible for creating insecure code, predominantly because of pre-training on publicly available repositories with vulnerable code. Moreover, developers are called the "weakest link in the chain" since they have very minimal knowledge of code security. Although existing solutions provide a reasonable solution to vulnerable code, they must adequately describe and educate the developers on code security to ensure that the security issues are not repeated. Therefore we introduce a multipurpose code vulnerability analysis system exttt{SecRepair}, powered by a large language model, CodeGen2 assisting the developer in identifying and generating fixed code along with a complete description of the vulnerability with a code comment. Our innovative methodology uses a reinforcement learning paradigm to generate code comments augmented by a semantic reward mechanism. Inspired by how humans fix code issues, we propose an instruction-based dataset suitable for vulnerability analysis with LLMs. We further identify zero-day and N-day vulnerabilities in 6 Open Source IoT Operating Systems on GitHub. Our findings underscore that incorporating reinforcement learning coupled with semantic reward augments our model's performance, thereby fortifying its capacity to address code vulnerabilities with improved efficacy.

研究动机与目标

  • 在 AI 辅助的开发工具中,推动超越功能性的安全代码修复。
  • Develop an end-to-end system that identifies, repairs, and describes vulnerabilities in C/C++ code.
  • Create an instruction-based vulnerability dataset (InstructVul) tailored to security concerns.
  • Enable generation of concise code comments suitable as commit messages.
  • Demonstrate zero-day and N-day vulnerability analysis in real OSS IoT operating systems.

提出的方法

  • 利用针对安全分析微调的基于 CodeGen2 的大语言模型来识别、修复和描述漏洞。
  • 使用包含漏洞识别、修复、描述和代码注释生成任务的基于指令的数据集(InstructVul)进行训练。
  • 通过去掉编码器来修改编码-解码器架构,以实现更长的代码序列,并将输入输出训练成单个类似语言模型的序列。
  • 使用因果解码器对漏洞描述(代码到文本)进行微调,以确保具有顺序性和上下文感知的生成。
  • 应用带语义感知奖励(基于 BERTScore)的强化学习和 PPO,以优化简洁且语义保留的代码注释。
  • 使用 BLEU、Rouge-L 和人工评估进行评估;对漏洞检测使用交叉熵并稳定修复质量。

实验结果

研究问题

  • RQ1RQ1:系统是否可以自动识别漏洞并准确修复代码?
  • RQ2RQ2:系统是否可以向开发者提供全面的漏洞描述?
  • RQ3RQ3:系统是否能够优化并总结描述,并生成简洁的代码注释?

主要发现

模型参数BLEURouge-LF1精确度召回率准确率
Devign<1M0.560.550.560.550.550.56
VELVET<1M0.620.610.590.610.590.68
PFGCN110M0.640.640.610.640.610.62
CodeT5770M0.680.620.590.620.590.68
CodeGen21B0.720.700.680.700.680.79
CodeGen23.7B0.750.770.730.770.730.85
SecRepair7B0.820.800.700.800.700.88
  • SecRepair (7B) 在漏洞识别/修复任务上达到 F1 0.82、Precision 0.80、Recall 0.70 和 Accuracy 0.88(表 1)。
  • 在 InstructVul 数据集上,SecRepair (7B) 在修复任务上达到 BLEU 0.82 和 Rouge-L 0.80,在可比参数规模下优于若干基线。
  • 在面向开发者的漏洞描述质量方面,SecRepair (7B) 达到 BLEU 0.76、Rouge-L 0.98 和人工分数 5。
  • 相较于纯微调,带语义奖励的强化学习提升了代码注释生成(SecRepair 7B:BLEU 0.60、Rouge-L 0.72、人工 5)。
  • 消融研究显示温度和束搜索大小对性能有影响,最佳点约在温度 0.5 左右,且在推理成本较高时,较大的束搜索大小可带来收益。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。