QUICK REVIEW

[论文解读] Pitfalls in Language Models for Code Intelligence: A Taxonomy and Survey

Xinyu She, Yue Liu|arXiv (Cornell University)|Oct 27, 2023

Software Engineering Research被引用 8

一句话总结

系统性文献综述识别了67项 LM4Code 研究，并提出一个覆盖数据、系统设计、评估和部署四个领域的陷阱分类法，以及提高可靠性的含义与解决方案。

ABSTRACT

Modern language models (LMs) have been successfully employed in source code generation and understanding, leading to a significant increase in research focused on learning-based code intelligence, such as automated bug repair, and test case generation. Despite their great potential, language models for code intelligence (LM4Code) are susceptible to potential pitfalls, which hinder realistic performance and further impact their reliability and applicability in real-world deployment. Such challenges drive the need for a comprehensive understanding - not just identifying these issues but delving into their possible implications and existing solutions to build more reliable language models tailored to code intelligence. Based on a well-defined systematic research approach, we conducted an extensive literature review to uncover the pitfalls inherent in LM4Code. Finally, 67 primary studies from top-tier venues have been identified. After carefully examining these studies, we designed a taxonomy of pitfalls in LM4Code research and conducted a systematic study to summarize the issues, implications, current solutions, and challenges of different pitfalls for LM4Code systems. We developed a comprehensive classification scheme that dissects pitfalls across four crucial aspects: data collection and labeling, system design and learning, performance evaluation, and deployment and maintenance. Through this study, we aim to provide a roadmap for researchers and practitioners, facilitating their understanding and utilization of LM4Code in reliable and trustworthy ways.

研究动机与目标

识别并分类影响 LM4Code 在数据、设计、评估和部署生命周期中的陷阱。
评估这些陷阱对性能、可靠性与可信度的影响。
总结现有解决方案与最佳实践，以缓解 LM4Code 的陷阱。
提供关于稳健 LM4Code 研究与实践的开放挑战和方向的路线图。

提出的方法

按照 Kitchenham 和 Charters 的指南进行系统文献综述（SLR）。
使用准金标准查询与向后/向前雪球法收集相关原始研究。
将发现分为四阶段的 LM4Code 生命周期：数据收集/标注、系统设计/学习、性能评估、部署/维护。
综合定性与定量洞见，关于陷阱、影响与解决方案。
分析发表分布趋势及 LM 类型，以揭示 LM4Code 研究的趋势。

实验结果

研究问题

RQ1RQ1: 在代码智能的语言模型中，哪些类型的陷阱较为普遍？
RQ2RQ2: 这些陷阱对 LM4Code 系统的有效性、可靠性与伦理有哪些影响？
RQ3RQ3: 已提出哪些解决方案来应对这些陷阱？

主要发现

识别并分析了67项原始研究（2018–2023）。
提出了四方面的分类法：数据收集/标注、系统设计/学习、性能评估、部署/维护。
与数据相关的陷阱包括分布不均、数据噪声和标注错误，可能导致性能高估和模型效能受损。
系统设计方面的陷阱包括数据窥探、虚假相关性以及不当的模型设计，导致过度乐观的指标和不可靠的行为。
解决方案包括数据清洗/降噪、真实世界基准、跨项目验证、基于时间的分割、正则化，以及对模型可解释性的强调。
研究方向呈现向基于 Transformer 的 LM4Code 与透明度感知评估的转变。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。