QUICK REVIEW

[论文解读] Security Weaknesses of Copilot-Generated Code in GitHub Projects: An Empirical Study

Yujia Fu, Peng Liang|arXiv (Cornell University)|Oct 3, 2023

Software Engineering Research被引用 12

一句话总结

一个基于实证的研究，分析自 GitHub 项目的 435 个 Copilot 生成的代码片段，使用 CodeQL 和语言特定的静态分析工具，在六种语言中识别安全弱点和 CWEs。

ABSTRACT

Modern code generation tools utilizing AI models like Large Language Models (LLMs) have gained increased popularity due to their ability to produce functional code. However, their usage presents security challenges, often resulting in insecure code merging into the code base. Thus, evaluating the quality of generated code, especially its security, is crucial. While prior research explored various aspects of code generation, the focus on security has been limited, mostly examining code produced in controlled environments rather than open source development scenarios. To address this gap, we conducted an empirical study, analyzing code snippets generated by GitHub Copilot and two other AI code generation tools (i.e., CodeWhisperer and Codeium) from GitHub projects. Our analysis identified 733 snippets, revealing a high likelihood of security weaknesses, with 29.5% of Python and 24.2% of JavaScript snippets affected. These issues span 43 Common Weakness Enumeration (CWE) categories, including significant ones like CWE-330: Use of Insufficiently Random Values, CWE-94: Improper Control of Generation of Code, and CWE-79: Cross-site Scripting. Notably, eight of those CWEs are among the 2023 CWE Top-25, highlighting their severity. We further examined using Copilot Chat to fix security issues in Copilot-generated code by providing Copilot Chat with warning messages from the static analysis tools, and up to 55.5% of the security issues can be fixed. We finally provide the suggestions for mitigating security issues in generated code.

研究动机与目标

评估实际 GitHub 项目中 Copilot 生成代码的安全弱点的普遍性。
识别 Copilot 生成片段中存在的安全弱点类型（CWE）。
评估识别出的弱点有多少符合 MITRE CWE Top-25。
为开发者在使用 Copilot 生成的代码时的安全实践提供指南。

提出的方法

整理来自 GitHub 项目的 Copilot 生成的代码片段数据集（共 435 条；其中 249 条来自 Repository-label，186 条来自 Code-label）。
进行多工具静态分析（CodeQL 以及语言特定工具）以在六种语言（Python、JavaScript、Java、C++、Go、C#）中识别 CWE。
将工具结果映射到 CWE ID（包括对某些工具的手动映射）。
筛选结果以确保代码片段确实是 Copilot 生成且与安全弱点相关。
汇总结果以确定 CWE 的流行度和分布情况，包括与 MITRE Top-25 的对齐。

Figure 1. Overview of the research process

实验结果

研究问题

RQ1RQ1：在 GitHub 项目中 Copilot 生成的代码有多安全？
RQ2RQ2：Copilot 生成的代码片段中存在哪些安全弱点？
RQ3RQ3：检测到的弱点中有多少属于 2022 年 MITRE CWE Top-25？

主要发现

Copilot 生成的代码片段中有 35.8% 含有安全弱点。
弱点横跨六种语言和 42 种 CWE，其中最常见的是 CWE-78（OS Command Injection）、CWE-330（Use of Insufficiently Random Values）和 CWE-703（Improper Check or Handling of Exceptional Conditions）。
在这 42 种 CWE 中，有 11 种属于 2022 年 CWE Top-25。
Python 的脆弱性率最高，为 39.4%；C++ 和 Go 的按语言百分比尤为高，分别为 46.1% 和 45.0%。
Go 和 Python 显示出更高的弱点绝对数量，反映了它们在 Copilot 使用中的受欢迎程度。
提供了经过筛选的 Copilot 生成代码数据集和用于未来研究的复制包。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。