[Paper Review] Unsupervised Neural Machine Translation
This paper trains an NMT system without any parallel data by using a shared encoder with fixed cross-lingual embeddings, denoising, and on-the-fly backtranslation, achieving notable BLEU scores on WMT 2014 French-English and German-English tasks.
In spite of the recent success of neural machine translation (NMT) in standard benchmarks, the lack of large parallel corpora poses a major practical problem for many language pairs. There have been several proposals to alleviate this issue with, for instance, triangulation and semi-supervised learning techniques, but they still require a strong cross-lingual signal. In this work, we completely remove the need of parallel data and propose a novel method to train an NMT system in a completely unsupervised manner, relying on nothing but monolingual corpora. Our model builds upon the recent work on unsupervised embedding mappings, and consists of a slightly modified attentional encoder-decoder model that can be trained on monolingual corpora alone using a combination of denoising and backtranslation. Despite the simplicity of the approach, our system obtains 15.56 and 10.21 BLEU points in WMT 2014 French-to-English and German-to-English translation. The model can also profit from small parallel corpora, and attains 21.81 and 15.24 points when combined with 100,000 parallel sentences, respectively. Our implementation is released as an open source project.
Motivation & Objective
- Motivate practical NMT for language pairs with little or no parallel data.
- Propose an unsupervised NMT model that leverages monolingual corpora only.
- Show that denoising and backtranslation enable learning translation without parallel data.
Proposed method
- Employ a dual, two-language system with a single shared encoder.
- Fix cross-lingual embeddings in the encoder to obtain language-independent representations.
- Train via denoising of noised inputs to learn compositional structure across languages.
- Incorporate on-the-fly backtranslation to generate pseudo-parallel data during training.
- Optionally combine with small parallel corpora to form a semi-supervised setup.
- Use standard attention-based encoder-decoder with GRU units and 300-d embeddings; train with cross-entropy loss and Adam optimizer.
Experimental results
Research questions
- RQ1Can an NMT system be effectively trained from monolingual data only, without any parallel corpus?
- RQ2How do denoising and backtranslation contribute to cross-lingual translation quality in an unsupervised setting?
- RQ3To what extent do fixed cross-lingual embeddings and a shared encoder enable learning true translation relations across languages?
Key findings
| FR-EN | EN-FR | DE-EN | EN-DE | |
|---|---|---|---|---|
| 1. Baseline (emb. nearest neighbor) | 9.98 | 6.25 | 7.07 | 4.39 |
| 2. Proposed (denoising) | 7.28 | 5.33 | 3.64 | 2.40 |
| 3. Proposed (+ backtranslation) | 15.56 | 15.13 | 10.21 | 6.55 |
| 4. Proposed (+ BPE) | 15.56 | 14.36 | 10.16 | 6.89 |
| 5. Semi- Proposed (10k parallel) | 18.57 | 17.34 | 11.47 | 7.86 |
| 6. Semi- Proposed (100k parallel) | 21.81 | 21.74 | 15.24 | 10.95 |
| 7. Comparable NMT (10k parallel) | 1.88 | 1.66 | 1.33 | 0.82 |
| 8. Comparable NMT (100k parallel) | 10.40 | 9.19 | 8.11 | 5.29 |
| 9. Comparable NMT (full parallel) | 20.48 | 19.89 | 15.04 | 11.05 |
| 10. GNMT (Wu et al., 2016) | - | 38.95 | - | 24.61 |
- Achieves 15.56 BLEU (FR→EN) and 10.21 BLEU (DE→EN) in unsupervised French-English and German-English translation on WMT 2014.
- Gains to 21.81 BLEU (FR→EN) and 15.24 BLEU (DE→EN) when combined with 100k parallel sentences.
- Backtranslation substantially improves performance over denoising alone, indicating its critical role.
- Subword units (BPE) provide marginal, direction-dependent benefits.
- Semisupervised training with small parallel data yields further improvements over fully unsupervised training.
- The approach learns non-trivial translation relations beyond word-by-word substitutions.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.