QUICK REVIEW

[論文レビュー] Robustness, Security, Privacy, Explainability, Efficiency, and Usability of Large Language Models for Code

Zhou Yang, Zhensu Sun|arXiv (Cornell University)|Mar 12, 2024

Software Engineering Research被引用数 6

ひとこと要約

A systematic literature review of 146 studies identifying seven non-functional properties beyond accuracy for LLMs in code, with state-of-the-art trends and gaps.

ABSTRACT

Large language models for code (LLM4Code), which demonstrate strong performance (e.g., high accuracy) in processing source code, have significantly transformed software engineering. Many studies separately investigate the non-functional properties of LM4Code, but there is no systematic review of how these properties are evaluated and enhanced. This paper fills this gap by thoroughly examining 146 relevant studies, thereby presenting the first systematic literature review to identify seven important properties beyond accuracy, including robustness, security, privacy, explainability, efficiency, and usability. We discuss the current state-of-the-art methods and trends, identify gaps in existing research, and present promising directions for future study.

研究の動機と目的

Identify seven non-functional properties beyond accuracy for LLM4Code (robustness, security, privacy, explainability, efficiency, and usability).
Assess how these properties are defined, evaluated, and enhanced in current research.
Summarize state-of-the-art techniques, datasets, and measurement criteria for each property.
Highlight gaps, challenges, and opportunities to guide future research in LLM4Code.
Compare attention between LLM4Code and non-LLM4Code studies where relevant.

提案手法

Systematic literature review of 146 papers (2019–2024) focused on LLM4Code non-functional properties beyond accuracy.
Two-stage paper identification: keyword queries in DBLP, followed by backward/forward snowballing via Semantic Scholar; eight rounds of snowballing to reach transitive closure.
Definition of seven properties and synthesis of current evaluation/enhancement techniques per property.
Categorization of robustness testing approaches (white-box/black-box) and test-input generation methods (gradient-based, heuristic-driven, search-based, reinforcement learning, style/transferability).
Grounding of discussions in representative studies and presentation of trends and gaps (Table 1 references in the paper).

実験結果

リサーチクエスチョン

RQ1What non-functional properties beyond accuracy are studied for LLM4Code?
RQ2How are robustness, security, privacy, explainability, efficiency, and usability evaluated and improved in the literature?
RQ3What are the major gaps and future directions for these properties in LLM4Code?
RQ4What threats to validity affect studies of non-functional properties in LLM4Code?

主な発見

Robustness is the most studied property among LLM4Code papers (largest share).
Security and privacy concerns for LLM4Code include data poisoning, backdoors, and leakage of sensitive information; membership inference and dataset ownership issues are discussed.
Explainability shows inconsistencies across techniques and tasks, with gaps in meeting end-users’ needs.
Efficiency trends include parameter-efficient fine-tuning and model compression, with mixed impacts on other properties.
Usability findings are mixed, with productivity effects varying and limited practical usability interventions in real settings.
The literature reveals broad research opportunities and challenges in evaluating and enhancing these non-functional properties beyond accuracy.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。