[论文解读] Unsupervised Translation of Programming Languages
本文训练 TransCoder,一个完全无监督的神经转译器,使用单语言代码在 C++、Java 和 Python 之间翻译函数,并发布一个含有 852-function 的并行测试集及单元测试,显示相对于基于规则的基线具有较强的性能。
A transcompiler, also known as source-to-source translator, is a system that converts source code from a high-level programming language (such as C++ or Python) to another. Transcompilers are primarily used for interoperability, and to port codebases written in an obsolete or deprecated language (e.g. COBOL, Python 2) to a modern one. They typically rely on handcrafted rewrite rules, applied to the source code abstract syntax tree. Unfortunately, the resulting translations often lack readability, fail to respect the target language conventions, and require manual modifications in order to work properly. The overall translation process is timeconsuming and requires expertise in both the source and target languages, making code-translation projects expensive. Although neural models significantly outperform their rule-based counterparts in the context of natural language translation, their applications to transcompilation have been limited due to the scarcity of parallel data in this domain. In this paper, we propose to leverage recent approaches in unsupervised machine translation to train a fully unsupervised neural transcompiler. We train our model on source code from open source GitHub projects, and show that it can translate functions between C++, Java, and Python with high accuracy. Our method relies exclusively on monolingual source code, requires no expertise in the source or target languages, and can easily be generalized to other programming languages. We also build and release a test set composed of 852 parallel functions, along with unit tests to check the correctness of translations. We show that our model outperforms rule-based commercial baselines by a significant margin.
研究动机与目标
- Motivate automated translation of existing codebases across languages without parallel data or expert rules.
- Develop a fully unsupervised transcompiler trained on GitHub code using cross-lingual pretraining, denoising auto-encoding, and back-translation.
- Demonstrate that the unsupervised model can outperform rule-based and commercial baselines on function-level translations.
- Provide a validation/test set of parallel functions with unit tests to evaluate translation correctness.
提出的方法
- Use a single Transformer-based encoder-decoder model shared across C++, Java, and Python.
- Pretrain with cross-lingual masked language modeling (XLM) on monolingual code for cross-language representations.
- Apply denoising auto-encoding to make the decoder generate valid sequences and robust representations.
- Leverage back-translation to create pseudo-parallel data between language pairs.
- Evaluate translations using computational accuracy via unit tests, alongside reference match and BLEU metrics.
实验结果
研究问题
- RQ1Can a fully unsupervised neural transcompiler learn to translate between C++, Java, and Python using only monolingual code?
- RQ2How does TransCoder perform compared to rule-based and commercial baselines on function-level translations?
- RQ3What evaluation metrics best reflect functional correctness beyond BLEU or reference overlap?
- RQ4What is the impact of beam search and input preprocessing on translation quality?
- RQ5Can the approach generalize to other programming languages beyond the three studied?
主要发现
| Language pair | Reference Match | BLEU | Computational Accuracy |
|---|---|---|---|
| C++ → Java | 3.12 | 85.41 | 60.91 |
| C++ → Python | 6.7 | 70.11 | 44.49 |
| Java → C++ | 24.68 | 96.99 | 80.9 |
| Java → Python | 3.67 | 68.12 | 34.99 |
| Python → C++ | 4.94 | 65.37 | 32.19 |
| Python → Java | 0.83 | 64.64 | 24.74 |
- TransCoder achieves higher computational accuracy than baselines across language directions (e.g., C++→Java: 60.91%; Java→Python: 34.99%).
- Reference match and BLEU do not correlate well with actual functional correctness, as many correct translations do not match references exactly or have high BLEU.
- Beam search improves computational accuracy substantially (up to 33.7% in some directions) when using unit-test validation.
- TransCoder outperforms a Java→Python baseline and a C++→Java commercial baseline in computational accuracy.
- Keeping source code comments increases anchor points and improves cross-language alignment, boosting performance.
- The model learns to map language-specific constructs and standard library usage across languages without supervision.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。