QUICK REVIEW

[论文解读] RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation

Fengji Zhang, Bei Chen|arXiv (Cornell University)|Mar 22, 2023

Software Engineering Research被引用 8

一句话总结

RepoCoder 引入一个基于迭代检索-生成的仓库级代码补全框架，利用仓库上下文与迭代提示在多模型下提升相较于“就地文件内”基线的补全准确性。并且提出 RepoEval 基准测试，使用单元测试进行行、API 调用和函数体的补全评估。

ABSTRACT

The task of repository-level code completion is to continue writing the unfinished code based on a broader context of the repository. While for automated code completion tools, it is difficult to utilize the useful information scattered in different files. We propose RepoCoder, a simple, generic, and effective framework to address the challenge. It streamlines the repository-level code completion process by incorporating a similarity-based retriever and a pre-trained code language model in an iterative retrieval-generation pipeline. RepoCoder makes effective utilization of repository-level information for code completion and has the ability to generate code at various levels of granularity. Moreover, we propose a new benchmark RepoEval, which consists of the latest and high-quality real-world repositories covering line, API invocation, and function body completion scenarios. Experimental results indicate that RepoCoder significantly improves the In-File completion baseline by over 10% in all settings and consistently outperforms the vanilla retrieval-augmented code completion approach. Furthermore, we validate the effectiveness of RepoCoder through comprehensive analysis, providing valuable insights for future research. Our source code and benchmark are publicly available: https://github.com/microsoft/CodeT/tree/main/RepoCoder

研究动机与目标

促使仓库级代码补全利用代码库跨文件上下文。
提出一个通用的检索增强生成框架，以融入仓库级信息。
展示迭代检索在弥合检索上下文与目标补全之间的差距。
引入 RepoEval，用单元测试评估行、API 调用与函数体补全。

提出的方法

通过对文件进行滑动窗口构造仓库代码语料，以建立检索集合。
使用检索器将未完成代码 X 作为查询（C_repo, X），从仓库中检索相关片段。
应用一个迭代方案，在每次生成 Y^{i-1} 时用于改进对 Y^{i} 的检索，而不改变模型参数。
提示设计将检索片段 C_ret 与未完成代码 X 结合起来，引导预训练语言模型进行生成。
同时利用稀疏（基于 Jaccard）与密集（基于 UniXcoder）两类检索器，以及多种生成模型（GPT-3.5-Turbo、CodeGen 变体）。
使用 RepoEval 进行基准测试：三种粒度（行、API 调用、函数体）并基于单元测试进行评估。

实验结果

研究问题

RQ1仓库级上下文是否能超越仅在文件内的上下文提升代码补全？
RQ2在不同模型下，迭代检索-生成（RepoCoder）是否优于单次传递的 RAG 和就地文件基线？
RQ3检索质量如何影响仓库级代码补全的补全性能？
RQ4迭代次数对性能有何影响，何时停止迭代？
RQ5RepoEval 如何捕捉包含单元测试的真实世界仓库级补全场景？

主要发现

RepoCoder 在所有数据集和模型上，始终在文件内补全的两项指标（EM、ES）上提升超过 10%。
在行和 API 补全任务中，RepoCoder 的两次或以上迭代稳健地优于原生的 RAG。
在 RepoCoder 设置下，CodeGen 350M 参数的模型实现了与 GPT-3.5-Turbo 相当的性能。
在 RepoCoder 框架中，密集检索器的性能与稀疏检索器相近，表明对检索选择具有鲁棒性。
使用两次迭代的 RepoCoder 对真实 API 调用的召回率更高，表明对生成的检索引导更好。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。