QUICK REVIEW

[論文レビュー] Security Weaknesses of Copilot-Generated Code in GitHub Projects: An Empirical Study

Yujia Fu, Peng Liang|arXiv (Cornell University)|Oct 3, 2023

Software Engineering Research被引用数 12

ひとこと要約

GitHub プロジェクトの Copilot が生成した 435 件のコードスニペットを実証的に分析し、6 言語にわたる CodeQL および言語固有の静的解析ツールを用いてセキュリティの弱点と CWEs を特定します。

ABSTRACT

Modern code generation tools utilizing AI models like Large Language Models (LLMs) have gained increased popularity due to their ability to produce functional code. However, their usage presents security challenges, often resulting in insecure code merging into the code base. Thus, evaluating the quality of generated code, especially its security, is crucial. While prior research explored various aspects of code generation, the focus on security has been limited, mostly examining code produced in controlled environments rather than open source development scenarios. To address this gap, we conducted an empirical study, analyzing code snippets generated by GitHub Copilot and two other AI code generation tools (i.e., CodeWhisperer and Codeium) from GitHub projects. Our analysis identified 733 snippets, revealing a high likelihood of security weaknesses, with 29.5% of Python and 24.2% of JavaScript snippets affected. These issues span 43 Common Weakness Enumeration (CWE) categories, including significant ones like CWE-330: Use of Insufficiently Random Values, CWE-94: Improper Control of Generation of Code, and CWE-79: Cross-site Scripting. Notably, eight of those CWEs are among the 2023 CWE Top-25, highlighting their severity. We further examined using Copilot Chat to fix security issues in Copilot-generated code by providing Copilot Chat with warning messages from the static analysis tools, and up to 55.5% of the security issues can be fixed. We finally provide the suggestions for mitigating security issues in generated code.

研究の動機と目的

実世界の GitHub プロジェクトにおける Copilot が生成したコードのセキュリティ上の弱点の普及率を評価する。
Copilot が生成したスニペットに含まれるセキュリティ上の弱点の種類（CWEs）を特定する。
検出された弱点のうち MITRE CWE Top-25 への整合性を評価する。
Copilot が生成したコードのセキュリティ実践に関する開発者へのガイダンスを提供する。

提案手法

GitHub プロジェクトから Copilot が生成したコードスニペットのデータセットを作成（435 件全体；Repository-label から 249 件、Code-label から 186 件）。
6 言語（Python, JavaScript, Java, C++, Go, C#）にまたがる CWEs を識別するため、CodeQL と言語固有のツールを用いたマルチツール静的解析を実施する。
ツール結果を CWE ID にマッピング（いくつかのツールについては manual mapping を含む）。
スニペットが実際に Copilot 生成であり、セキュリティの弱点に関連していることを確認するために結果をフィルタリングする。
結果を集計して、MITRE Top-25 への適合を含む CWEs の有病率と分布を決定する。

Figure 1. Overview of the research process

実験結果

リサーチクエスチョン

RQ1RQ1: Copilot によって GitHub プロジェクトで生成されたコードはどれくらい安全か？
RQ2RQ2: Copilot が生成したコードスニペットにはどのようなセキュリティ上の弱点があるか？
RQ3RQ3: 検出された弱点のうちいくつが 2022 年 MITRE CWE Top-25 に属するか？

主な発見

Copilot が生成したコードスニペットの 35.8% にセキュリティの弱点が含まれている。
弱点は六つの言語と 42 の CWE に及び、最も頻繁なのは CWE-78 (OS Command Injection)、CWE-330 (Use of Insufficiently Random Values)、および CWE-703 (Improper Check or Handling of Exceptional Conditions) である。
42 の CWE のうち 11 は 2022 年の CWE Top-25 に含まれる。
Python は最も高い脆弱性率を示し 39.4%; C++ と Go は特に高い言語別割合を示し、それぞれ 46.1% と 45.0%。
Go と Python は Copilot の使用で人気があることを反映し、欠陥の絶対数が多い。
今後の研究のために Copilot が生成したコードデータセットと再現パッケージを提供。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。