QUICK REVIEW

[论文解读] Sustainable Code Generation Using Large Language Models: A Systematic Literature Review

Sabiya Banu Masthan Ali, Oussema Kirmani|arXiv (Cornell University)|Mar 1, 2026

Green IT and Sustainability被引用 0

一句话总结

本研究对大型语言模型生成代码的可持续性进行了系统文献综述，发现研究有限且碎片化，缺乏标准化基准和评估方法；并指出提示/微调方法以及面向能耗的评估存在空白。

ABSTRACT

Large Language Models (LLMs) are widely used in software engineering to generate, complete, translate, and fix code, improving developer productivity. While most research focuses on the energy consumption and carbon emissions of model training and inference, far less attention has been given to the sustainability of the code these models produce. The efficiency of generated code affects the long-term environmental impact of software systems. Inefficient code can increase CPU usage, memory consumption, execution time, and overall energy use during deployment and operation. As LLM-generated code becomes more common in real-world projects, even small inefficiencies can lead to high environmental costs over time. This paper examines existing research on the sustainability of code generated by LLMs. We conduct a systematic literature review to analyze selected primary studies and investigate the extent to which LLMs are capable of producing sustainable code. In addition, we examine how sustainability is defined and measured in this context, including the metrics and evaluation strategies used to assess energy efficiency and resource usage. We also explore whether techniques such as fine-tuning and prompt engineering influence the sustainability of generated code. Through a structured analysis of the selected studies, we categorize research efforts based on their methodological approaches, evaluation practices, and experimental settings. The findings indicate that research in this area remains relatively limited and fragmented, with no widely accepted framework for measuring or benchmarking the sustainability of LLM-generated code. These observations highlight the need for clearer definitions, standardized evaluation methods, and systematic research to support environmentally friendly AI-assisted software engineering.

研究动机与目标

识别并对评估LLM生成代码的可持续性的研究进行分类。
检验用于能效与资源使用评估的指标与评估策略。
识别用于可持续性评估的基准、数据集与工具。
评估提示策略或微调如何影响所生成代码的可持续性。

提出的方法

遵循 Kitchenham 指导的 SLR 协议。
使用结构化检索字符串在 IEEE Xplore、Compendex、Inspec、Scopus、ACM Digital Library 进行检索。
应用明确的纳入/排除标准，聚焦于 LLMs、代码生成与具体验证的可持续性。
使用数据提取表捕捉研究特征与结果。
采用碎片化检索（向后与向前）以补充原始研究，使最终集合增加到 19 篇。
以结构化分析报告模型、语言、评估做法与局限性等结果。

实验结果

研究问题

RQ1RQ1: LLM 在实际情境中生成符合可持续性原则的代码的效果如何？
RQ2RQ2: 用于评估可持续性的指标或参数有哪些？
RQ3RQ3: 用于评估 LLM 生成代码的可持续性的基准、数据集与工具有哪些？
RQ4RQ4: 提示策略或微调方法如何影响所生成代码的可持续性？

主要发现

该领域的研究总体上相对有限且碎片化。
评估的模型类型呈狭窄多样化，聚焦于大型语言模型，对小型语言模型关注较少。
能量衡量主要在软件层面进行；很少有研究使用外部硬件工具进行评估。
大多数研究聚焦于少数编程语言和通用软件开发；对其他领域关注较少。
缺乏专门用于评估 LLM 生成代码的可持续性基准。
以可持续性为导向的微调在很大程度上尚未作为研究方向被充分探索。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。