[Paper Review] Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation
This paper proposes a scheduled multi-task learning framework with subword sampling and denoising autoencoders to improve low-resource one-to-many neural machine translation in asymmetric-resource settings. By leveraging high-resource target languages via cross-lingual transfer, monolingual data through back-translation and autoencoders, and optimizing subword segmentation with Morfessor EM+Prune, the method achieves up to +12.7 BLEU gains, with scheduled training and subword regularization delivering the most consistent improvements across Estonian, Slovak/Czech, Danish/Swedish, and Norwegian/North Sámi tasks.
There are several approaches for improving neural machine translation for low-resource languages: Monolingual data can be exploited via pretraining or data augmentation; Parallel corpora on related language pairs can be used via parameter sharing or transfer learning in multilingual models; Subword segmentation and regularization techniques can be applied to ensure high coverage of the vocabulary. We review these approaches in the context of an asymmetric-resource one-to-many translation task, in which the pair of target languages are related, with one being a very low-resource and the other a higher-resource language. We test various methods on three artificially restricted translation tasks -- English to Estonian (low-resource) and Finnish (high-resource), English to Slovak and Czech, English to Danish and Swedish -- and one real-world task, Norwegian to North S\'ami and Finnish. The experiments show positive effects especially for scheduled multi-task learning, denoising autoencoder, and subword sampling.
Motivation & Objective
- Address the challenge of low-resource neural machine translation in asymmetric one-to-many settings where one target language has significantly less parallel data than another.
- Investigate effective transfer learning strategies to improve translation quality for morphologically rich, low-resource languages using related high-resource languages as auxiliary targets.
- Optimize subword segmentation and vocabulary construction to reduce data sparsity and improve generalization in low-resource scenarios.
- Evaluate the impact of monolingual data augmentation via back-translation and denoising autoencoders on low-resource translation performance.
- Determine the relative effectiveness of different training schedules, noise models, and vocabulary construction techniques in low-resource multilingual NMT.
Proposed method
- Proposes scheduled multi-task learning: pretraining on high-resource language tasks before fine-tuning with both high- and low-resource tasks to avoid overfitting.
- Introduces a taboo sampling task to model subword segmentation ambiguity by excluding certain subword units during training.
- Employs denoising autoencoders with multiple noise types—subword regularization, reordering, deletion, and substitution—to improve robustness.
- Uses Morfessor EM+Prune for data-driven subword vocabulary learning, favoring a prior-based segmentation over BPE or SentencePiece.
- Applies back-translation using target-to-source models to generate synthetic parallel data from monolingual corpora.
- Implements a multi-task dataloader capable of sampling noisy minibatches and scheduling task mixing during training.
Experimental results
Research questions
- RQ1Is scheduled multi-task learning superior to sequential or fully parallel transfer in asymmetric-resource one-to-many translation?
- RQ2Does a low-resource target-language denoising autoencoder improve translation quality, especially when combined with back-translation?
- RQ3How effective is subword regularization and what noise models (e.g., deletion, reordering) are most beneficial for low-resource NMT?
- RQ4Does the choice of subword segmentation method (e.g., Morfessor vs. SentencePiece) and vocabulary size significantly affect translation quality?
- RQ5How does language relatedness and data quantity (especially for low-resource languages) influence the effectiveness of cross-lingual transfer?
Key findings
- Scheduled multi-task learning yielded the highest individual gain of +2.4 BLEU, outperforming both sequential and fully parallel training strategies.
- Cross-lingual transfer via multilingual training achieved the largest improvement of +12.7 BLEU, demonstrating strong benefits from leveraging high-resource target languages.
- Back-translation provided up to +4.46 BLEU improvement, confirming its value as a data augmentation technique in low-resource settings.
- The Morfessor EM+Prune subword segmentation method outperformed SentencePiece by +0.6 BLEU, indicating the benefit of a prior-based segmentation approach.
- Subword regularization and multi-noise denoising autoencoders improved robustness, especially for rare words, though effects varied by language pair.
- Even with only 10k sentence pairs, low-resource parallel data produced substantial gains, with diminishing returns beyond that threshold.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.