QUICK REVIEW

[論文レビュー] Revisiting the Role of Natural Language Code Comments in Code Translation

Monika Gupta, Ajay Kumar Meena|arXiv (Cornell University)|Jan 23, 2026

Natural Language Processing Techniques被引用数 0

ひとこと要約

論文は自然言語のコードコメントが5言語を横断するLLMベースのコード翻訳に与える影響を実証的に研究し、翻訳を改善するためにコメントを選択的に追加するCOMMENTRAを提案します。

ABSTRACT

The advent of large language models (LLMs) has ushered in a new era in automated code translation across programming languages. Since most code-specific LLMs are pretrained on well-commented code from large repositories like GitHub, it is reasonable to hypothesize that natural language code comments could aid in improving translation quality. Despite their potential relevance, comments are largely absent from existing code translation benchmarks, rendering their impact on translation quality inadequately characterised. In this paper, we present a large-scale empirical study evaluating the impact of comments on translation performance. Our analysis involves more than $80,000$ translations, with and without comments, of $1100+$ code samples from two distinct benchmarks covering pairwise translations between five different programming languages: C, C++, Go, Java, and Python. Our results provide strong evidence that code comments, particularly those that describe the overall purpose of the code rather than line-by-line functionality, significantly enhance translation accuracy. Based on these findings, we propose COMMENTRA, a code translation approach, and demonstrate that it can potentially double the performance of LLM-based code translation. To the best of our knowledge, our study is the first in terms of its comprehensiveness, scale, and language coverage on how to improve code translation accuracy using code comments.

研究の動機と目的

自然言語のコードコメントがLLMベースのコード翻訳品質に与える影響を評価する。
コメントの特徴（意図、密度、言語、場所）と翻訳性能への影響を分析する。
コメントベースの翻訳フレームワーク（COMMENTRA）を開発・評価し、結果を改善するために選択的にコメントを挿入する。
翻訳パイプラインにおけるコメント利用を導くための言語を超えたベンチマークと洞察を提供する。

提案手法

C、C++、Go、Java、Python のAVATARおよびCodeNetから1100件超の一意なコードサンプルを収集する。
複数のコメント作成LLMでコメントを生成し、コメント付き/非コメント付きコードを複数の翻訳LLMで翻訳する。
意図、密度、言語、配置といったコメント要因を系統的に変化させ、コンパイルとテスト結果を用いて翻訳成功を測定する。
初期翻訳が失敗した場合のみコメントを追加する反復的翻訳アプローチであるCOMMENTRAを導入し、効率と正確性を向上させる。

Figure 1 : Experimental Setup; The exact prompts used are also shown here.

実験結果

リサーチクエスチョン

RQ1RQ1 - コードコメントの有用性：自然言語のコードコメントはLLMの翻訳性能を改善するのに役立つか。
RQ2RQ2 - コードコメントの意図：コメントの意図を分類し、意図ごとの有用性を理解できるか。
RQ3RQ3 - コードコメントの密度と言語：コメント密度と言語は翻訳の正確さにどう影響するか。
RQ4RQ4 - コメントの配置：コメントの配置は翻訳結果にどのように影響するか。

主な発見

コードコメントは翻訳性能を向上させる場合もあれば低下させる場合もあり、モデルや言語ペアによって効果が異なる。
GPTやDeepSeekのようなコメント付与モデルは他の代替手段よりも改善をもたらすことが多いが、効果は文脈によって異なる。
英語のコメントは一般にJava→PythonおよびPython→Javaの翻訳で最も大きな利益を生み出す傾向があるが、例外もある。
コメント密度の恣意的な制限は一貫して性能を向上させるわけではなく、選択的なコメント付与の効果的な指針は未解決のまま。
コード内コメントは擬似コードや独立したメソッド仕様よりも翻訳品質の改善に優れる。
提案手法のCOMMENTRAは初期翻訳が失敗した場合のみコメントを繰り返し挿入することで大幅な改善をもたらす。

Figure 2 : Venn diagrams depicting increase and decrease in LLMs performance in the commented code samples. Left and center diagrams show the overlap between uncommented successful and successfully translated model-commented samples; the right diagrams show the overlap between the various successful

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。