Skip to main content
QUICK REVIEW

[论文解读] AI-assisted Code Authoring at Scale: Fine-tuning, deploying, and mixed methods evaluation

Vijayaraghavan Murali, Chandra Maddila|arXiv (Cornell University)|May 20, 2023
Scheduling and Optimization Algorithms被引用 9
一句话总结

本文介绍 CodeCompose,这是在 Meta 部署的 AI 辅助代码编写工具,详细描述其基于 InCoder 的模型、对内部代码的微调、系统设计、在 9 种语言的部署,以及对使用情况和用户反馈的多维评估。

ABSTRACT

Generative LLMs have been shown to effectively power AI-based code authoring tools that can suggest entire statements or blocks of code during code authoring. In this paper we present CodeCompose, an AI-assisted code authoring tool developed and deployed at Meta internally. CodeCompose is based on the InCoder LLM that merges generative capabilities with bi-directionality. We have scaled up CodeCompose to serve tens of thousands of developers at Meta, across 9 programming languages and several coding surfaces. We present our experience in making design decisions about the model and system architecture for CodeCompose that addresses these challenges. To release a LLM model at this scale, we needed to first ensure that it is sufficiently accurate. In a random sample of 20K source code files, depending on the language, we are able to reproduce hidden lines between 40% and 58% of the time, an improvement of 1.4x and 4.1x over a model trained only on public data. We gradually rolled CodeCompose out to developers. At the time of this writing, 16K developers have used it with 8% of their code coming directly from CodeCompose. To triangulate our numerical findings, we conduct a thematic analysis on the feedback from 70 developers. We find that 91.5% of the feedback is positive, with the most common themes being discovering APIs, dealing with boilerplate code, and accelerating coding. Meta continues to integrate this feedback into CodeCompose.

研究动机与目标

  • 展示企业级代码助手如何在内部代码上进行微调并实现大规模部署。
  • 探讨面向大规模工业部署的系统设计选择、延迟优化和 UX 考量。
  • 通过定量和定性指标评估 CodeCompose 对开发者生产力和满意度的影响。
  • 在大型组织中识别信任、准确性和与现有 IDE 集成方面的挑战。

提出的方法

  • 在 Meta 的内部代码(CM/LCM 目标)上对基于 InCoder 的大模型进行微调。
  • 使用双向填充中间风格的训练目标(LCM),并附带元数据如语言、文件路径和内核信息。
  • 通过客户端-服务器架构部署,使用基于 LSP 的语言服务器和 Thrift 支撑的 GPU 推理层。
  • 在离线精确匹配和 BLEU 指标下跨语言衡量性能;收集真实世界使用指标(接受率、从建议中输入代码的比例)和定性用户反馈。
  • 实现以延迟为核心的优化(缓存、去抖动、最小批量处理)以及随机化推 rollout 策略以降低偏差。
  • 通过可复用的 LSP 组件和自研编辑界面,为多编辑器提供遥测与支持。
Figure 1 . CodeCompose (a) offers inline code suggestions in VSCode in a grey text when the user is typing code (Tab to accept), (b) changes its suggestion to adapt to a natural language comment, (c) suggests code or documentation based on code below the current position.
Figure 1 . CodeCompose (a) offers inline code suggestions in VSCode in a grey text when the user is typing code (Tab to accept), (b) changes its suggestion to adapt to a natural language comment, (c) suggests code or documentation based on code below the current position.

实验结果

研究问题

  • RQ1对 Meta 内部代码的微调如何提升多语言的代码建议准确性?
  • RQ2哪些架构和 UX 决策能够在大型企业中实现可扩展、低延迟的 AI 辅助代码完成?
  • RQ3CodeCompose 在现实世界中的接受率、使用情况和用户满意度有哪些影响?
  • RQ4在工业部署的 AI 代码助手中会出现哪些挑战(信任、幻觉、与现有工具的集成)?
  • RQ5上下文信息(光标前后的代码、文件、内核)如何影响模型表现?

主要发现

Language# Suggestions shownAcceptance ratePercentage of code typed using CodeCompose# Users
Python1.87mn22810.7k
Hack1.25mn22.5105.5k
C++608.1k20102.5k
Flow (Javascript)583.2k18.272.5k
Rust74.2k17.29212
Objective C++56.5k186429
Objective C34.4k18.16299
C23.5k21.312201
Typescript8.9k191076
All4.5mn22816k
  • CodeCompose 在所显示的建议中实现了跨语言的 22% 接受率。
  • 开发者输入的代码中有 8% 来自被 CodeCompose 接受的建议。
  • 定性反馈显示受访用户中有 91.5% 给出积极评价。
  • 对 Meta 内部数据的微调显著提高 Hack、Python、Flow、C++ 的精确匹配和 BLEU 分数;LCM 训练进一步提升性能。
  • 系统支持至少 9 种语言,且在现实世界中显示出显著使用情况,15 天内提出了 450 万条建议。
  • UX 决策(单行建议、300-500ms 延迟)和波段式 rollout 在维持信任和可用性方面效果显著。
Figure 2 . Steps to construct an input to the model in LCM with an example.
Figure 2 . Steps to construct an input to the model in LCM with an example.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。