[論文レビュー] Sustainable Code Generation Using Large Language Models: A Systematic Literature Review
This paper conducts a systematic literature review on the sustainability of code generated by large language models, finding limited, fragmented research with no standardized benchmarks or evaluation methods, and highlighting gaps in prompting/fine-tuning approaches and energy-focused assessments.
Large Language Models (LLMs) are widely used in software engineering to generate, complete, translate, and fix code, improving developer productivity. While most research focuses on the energy consumption and carbon emissions of model training and inference, far less attention has been given to the sustainability of the code these models produce. The efficiency of generated code affects the long-term environmental impact of software systems. Inefficient code can increase CPU usage, memory consumption, execution time, and overall energy use during deployment and operation. As LLM-generated code becomes more common in real-world projects, even small inefficiencies can lead to high environmental costs over time. This paper examines existing research on the sustainability of code generated by LLMs. We conduct a systematic literature review to analyze selected primary studies and investigate the extent to which LLMs are capable of producing sustainable code. In addition, we examine how sustainability is defined and measured in this context, including the metrics and evaluation strategies used to assess energy efficiency and resource usage. We also explore whether techniques such as fine-tuning and prompt engineering influence the sustainability of generated code. Through a structured analysis of the selected studies, we categorize research efforts based on their methodological approaches, evaluation practices, and experimental settings. The findings indicate that research in this area remains relatively limited and fragmented, with no widely accepted framework for measuring or benchmarking the sustainability of LLM-generated code. These observations highlight the need for clearer definitions, standardized evaluation methods, and systematic research to support environmentally friendly AI-assisted software engineering.
研究の動機と目的
- Identify and categorize studies evaluating sustainability of LLM-generated code.
- Examine metrics and evaluation strategies for energy efficiency and resource usage.
- Identify benchmarks, datasets, and tools used for sustainability evaluation.
- Assess how prompting strategies or fine-tuning influence the sustainability of generated code.
提案手法
- Followed Kitchenham-guided SLR protocol.
- Search across IEEE Xplore, Compendex, Inspec, Scopus, ACM Digital Library using a structured search string.
- Applied explicit inclusion/exclusion criteria focusing on LLMs, code generation, and sustainability with empirical evaluation.
- Used data extraction forms to capture study characteristics and results.
- Applied snowballing (backward and forward) to augment primary studies, increasing the final set to 19.
- Reported results with structured analysis of models, languages, evaluation practices, and limitations.
実験結果
リサーチクエスチョン
- RQ1RQ1: How effectively do LLMs produce code that aligns with sustainability principles in practice?
- RQ2RQ2: What metrics or parameters are used to evaluate sustainability?
- RQ3RQ3: What benchmarks, datasets, and tools are utilized to assess the sustainability of LLM-generated code?
- RQ4RQ4: How do prompting strategies or fine-tuning methods influence the sustainability of generated code?
主な発見
- この領域の研究は全体として比較的限定的で断片的である。
- 評価対象モデルの多様性が狭く、大規模言語モデルに焦点が当てられ、小規模言語モデルへの関心は低い。
- エネルギー測定は主にソフトウェアレベルで行われることが多く、評価のために外部ハードウェアツールを使用する研究は少ない。
- ほとんどの研究は限られたプログラミング言語と汎用ソフトウェア開発に焦点を当てており、他の領域にはあまり注目が集まっていない。
- LLM生成コードの評価に特化した持続可能性ベースのベンチマークが存在しない。
- 持続可能性を考慮したファインチューニングは研究方向としてほとんど検討されていない。
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。