QUICK REVIEW

[Paper Review] Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation

Stig-Arne Grönroos, Sámi Virpioja|arXiv (Cornell University)|Apr 8, 2020

Natural Language Processing Techniques100 references6 citations

TL;DR

This paper proposes a scheduled multi-task learning framework with subword sampling and denoising autoencoders to improve low-resource one-to-many neural machine translation in asymmetric-resource settings. By leveraging high-resource target languages via cross-lingual transfer, monolingual data through back-translation and autoencoders, and optimizing subword segmentation with Morfessor EM+Prune, the method achieves up to +12.7 BLEU gains, with scheduled training and subword regularization delivering the most consistent improvements across Estonian, Slovak/Czech, Danish/Swedish, and Norwegian/North Sámi tasks.

ABSTRACT

There are several approaches for improving neural machine translation for low-resource languages: Monolingual data can be exploited via pretraining or data augmentation; Parallel corpora on related language pairs can be used via parameter sharing or transfer learning in multilingual models; Subword segmentation and regularization techniques can be applied to ensure high coverage of the vocabulary. We review these approaches in the context of an asymmetric-resource one-to-many translation task, in which the pair of target languages are related, with one being a very low-resource and the other a higher-resource language. We test various methods on three artificially restricted translation tasks -- English to Estonian (low-resource) and Finnish (high-resource), English to Slovak and Czech, English to Danish and Swedish -- and one real-world task, Norwegian to North S\'ami and Finnish. The experiments show positive effects especially for scheduled multi-task learning, denoising autoencoder, and subword sampling.

Motivation & Objective

Address the challenge of low-resource neural machine translation in asymmetric one-to-many settings where one target language has significantly less parallel data than another.
Investigate effective transfer learning strategies to improve translation quality for morphologically rich, low-resource languages using related high-resource languages as auxiliary targets.
Optimize subword segmentation and vocabulary construction to reduce data sparsity and improve generalization in low-resource scenarios.
Evaluate the impact of monolingual data augmentation via back-translation and denoising autoencoders on low-resource translation performance.
Determine the relative effectiveness of different training schedules, noise models, and vocabulary construction techniques in low-resource multilingual NMT.

Proposed method

Proposes scheduled multi-task learning: pretraining on high-resource language tasks before fine-tuning with both high- and low-resource tasks to avoid overfitting.
Introduces a taboo sampling task to model subword segmentation ambiguity by excluding certain subword units during training.
Employs denoising autoencoders with multiple noise types—subword regularization, reordering, deletion, and substitution—to improve robustness.
Uses Morfessor EM+Prune for data-driven subword vocabulary learning, favoring a prior-based segmentation over BPE or SentencePiece.
Applies back-translation using target-to-source models to generate synthetic parallel data from monolingual corpora.
Implements a multi-task dataloader capable of sampling noisy minibatches and scheduling task mixing during training.

Experimental results

Research questions

RQ1Is scheduled multi-task learning superior to sequential or fully parallel transfer in asymmetric-resource one-to-many translation?
RQ2Does a low-resource target-language denoising autoencoder improve translation quality, especially when combined with back-translation?
RQ3How effective is subword regularization and what noise models (e.g., deletion, reordering) are most beneficial for low-resource NMT?
RQ4Does the choice of subword segmentation method (e.g., Morfessor vs. SentencePiece) and vocabulary size significantly affect translation quality?
RQ5How does language relatedness and data quantity (especially for low-resource languages) influence the effectiveness of cross-lingual transfer?

Key findings

Scheduled multi-task learning yielded the highest individual gain of +2.4 BLEU, outperforming both sequential and fully parallel training strategies.
Cross-lingual transfer via multilingual training achieved the largest improvement of +12.7 BLEU, demonstrating strong benefits from leveraging high-resource target languages.
Back-translation provided up to +4.46 BLEU improvement, confirming its value as a data augmentation technique in low-resource settings.
The Morfessor EM+Prune subword segmentation method outperformed SentencePiece by +0.6 BLEU, indicating the benefit of a prior-based segmentation approach.
Subword regularization and multi-noise denoising autoencoders improved robustness, especially for rare words, though effects varied by language pair.
Even with only 10k sentence pairs, low-resource parallel data produced substantial gains, with diminishing returns beyond that threshold.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.