QUICK REVIEW

[논문 리뷰] Revisiting the Role of Natural Language Code Comments in Code Translation

Monika Gupta, Ajay Kumar Meena|arXiv (Cornell University)|2026. 01. 23.

Natural Language Processing Techniques인용 수 0

한 줄 요약

이 논문은 자연어 코드 주석이 다섯 가지 언어에 걸친 LLM 기반 코드 번역에 어떻게 영향을 미치는지 실증적으로 연구하고, 개선된 번역을 위해 주석을 선택적으로 추가하는 COMMENTRA를 제안합니다.

ABSTRACT

The advent of large language models (LLMs) has ushered in a new era in automated code translation across programming languages. Since most code-specific LLMs are pretrained on well-commented code from large repositories like GitHub, it is reasonable to hypothesize that natural language code comments could aid in improving translation quality. Despite their potential relevance, comments are largely absent from existing code translation benchmarks, rendering their impact on translation quality inadequately characterised. In this paper, we present a large-scale empirical study evaluating the impact of comments on translation performance. Our analysis involves more than $80,000$ translations, with and without comments, of $1100+$ code samples from two distinct benchmarks covering pairwise translations between five different programming languages: C, C++, Go, Java, and Python. Our results provide strong evidence that code comments, particularly those that describe the overall purpose of the code rather than line-by-line functionality, significantly enhance translation accuracy. Based on these findings, we propose COMMENTRA, a code translation approach, and demonstrate that it can potentially double the performance of LLM-based code translation. To the best of our knowledge, our study is the first in terms of its comprehensiveness, scale, and language coverage on how to improve code translation accuracy using code comments.

연구 동기 및 목표

자연어 코드 주석이 LLM 기반 코드 번역 품질에 어떤 영향을 미치는지 평가한다.
주석의 특성(의도, 밀도, 언어, 위치)을 분석하고 이것이 번역 성능에 미치는 영향을 파악한다.
결과를 개선하기 위해 주석을 선택적으로 삽입하는 주석 기반 번역 프레임워크(COMMENTRA)를 개발하고 평가한다.
번역 파이프라인에서 주석 사용을 안내하기 위한 교차 언어 벤치마크 및 인사이트를 제공한다.

제안 방법

C, C++, Go, Java, Python에 걸쳐 AVATAR와 CodeNet에서 1100개 이상의 고유 코드 샘플을 수집한다.
여러 주석 생성 LLM으로 주석을 생성하고, 주석이 달린 코드와 달린 코드 모두를 여러 번역 LLM으로 번역한다.
주석 요인(의도, 밀도, 언어, 배치)을 체계적으로 변화시키고 컴파일 및 테스트 결과를 통해 번역 성공 여부를 측정한다.
초기 번역이 실패할 때만 주석을 추가하는 반복적 번역 접근 방식인 COMMENTRA를 도입하여 효율성 및 정확도를 개선한다.

Figure 1 : Experimental Setup; The exact prompts used are also shown here.

실험 결과

연구 질문

RQ1RQ1 - 코드 주석의 유용성: 자연어 코드 주석이 LLM의 번역 성능 향상에 도움을 주는가?
RQ2RQ2 - 코드 주석의 의도: 주석 의도를 분류하고 의도별 유용성을 이해할 수 있는가?
RQ3RQ3 - 코드 주석의 밀도와 언어: 주석의 밀도와 언어가 번역 정확도에 어떤 영향을 미치는가?
RQ4RQ4 - 주석의 위치: 주석의 배치가 번역 결과에 어떤 영향을 미치는가?

주요 결과

코드 주석은 번역 성능을 개선할 수도 있고 악화시킬 수도 있으며, 이는 모델과 언어 쌍에 따라 이득과 손실이 달라진다.
주석 생성 모델로서 GPT 및 DeepSeek은 일부 대안보다 더 큰 개선을 제공하는 경향이 있지만 맥락에 따라 효과가 다르다.
영어 주석은 일반적으로 Java-에서 Python으로, Python-에서 Java로의 번역에서 가장 강한 이득을 제공합니다. 예외도 있다.
주석 밀도에 대한 임의의 제한은 일관되게 성능을 개선하지 못하며, 선택적 주석 작성을 위한 효과적인 지침은 아직 남아 있다.
코드 내 주석이 의사코드나 독립적인 메서드 명세보다 번역 품질 개선에 더 우수하다.
제안된 COMMENTRA 프레임워크는 초기 번역이 실패할 때에만 주석을 반복적으로 삽입함으로써 상당한 개선을 제공합니다.

Figure 2 : Venn diagrams depicting increase and decrease in LLMs performance in the commented code samples. Left and center diagrams show the overlap between uncommented successful and successfully translated model-commented samples; the right diagrams show the overlap between the various successful

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.