Skip to main content
QUICK REVIEW

[Paper Review] Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation

Stig-Arne Grönroos, Sámi Virpioja|arXiv (Cornell University)|Apr 8, 2020
Natural Language Processing Techniques100 references6 citations
TL;DR

This paper proposes a scheduled multi-task learning framework with subword sampling and denoising autoencoders to improve low-resource one-to-many neural machine translation in asymmetric-resource settings. By leveraging high-resource target languages via cross-lingual transfer, monolingual data through back-translation and autoencoders, and optimizing subword segmentation with Morfessor EM+Prune, the method achieves up to +12.7 BLEU gains, with scheduled training and subword regularization delivering the most consistent improvements across Estonian, Slovak/Czech, Danish/Swedish, and Norwegian/North Sámi tasks.

ABSTRACT

There are several approaches for improving neural machine translation for low-resource languages: Monolingual data can be exploited via pretraining or data augmentation; Parallel corpora on related language pairs can be used via parameter sharing or transfer learning in multilingual models; Subword segmentation and regularization techniques can be applied to ensure high coverage of the vocabulary. We review these approaches in the context of an asymmetric-resource one-to-many translation task, in which the pair of target languages are related, with one being a very low-resource and the other a higher-resource language. We test various methods on three artificially restricted translation tasks -- English to Estonian (low-resource) and Finnish (high-resource), English to Slovak and Czech, English to Danish and Swedish -- and one real-world task, Norwegian to North S\'ami and Finnish. The experiments show positive effects especially for scheduled multi-task learning, denoising autoencoder, and subword sampling.

Motivation & Objective

  • Address the challenge of low-resource neural machine translation in asymmetric one-to-many settings where one target language has significantly less parallel data than another.
  • Investigate effective transfer learning strategies to improve translation quality for morphologically rich, low-resource languages using related high-resource languages as auxiliary targets.
  • Optimize subword segmentation and vocabulary construction to reduce data sparsity and improve generalization in low-resource scenarios.
  • Evaluate the impact of monolingual data augmentation via back-translation and denoising autoencoders on low-resource translation performance.
  • Determine the relative effectiveness of different training schedules, noise models, and vocabulary construction techniques in low-resource multilingual NMT.

Proposed method

  • Proposes scheduled multi-task learning: pretraining on high-resource language tasks before fine-tuning with both high- and low-resource tasks to avoid overfitting.
  • Introduces a taboo sampling task to model subword segmentation ambiguity by excluding certain subword units during training.
  • Employs denoising autoencoders with multiple noise types—subword regularization, reordering, deletion, and substitution—to improve robustness.
  • Uses Morfessor EM+Prune for data-driven subword vocabulary learning, favoring a prior-based segmentation over BPE or SentencePiece.
  • Applies back-translation using target-to-source models to generate synthetic parallel data from monolingual corpora.
  • Implements a multi-task dataloader capable of sampling noisy minibatches and scheduling task mixing during training.

Experimental results

Research questions

  • RQ1Is scheduled multi-task learning superior to sequential or fully parallel transfer in asymmetric-resource one-to-many translation?
  • RQ2Does a low-resource target-language denoising autoencoder improve translation quality, especially when combined with back-translation?
  • RQ3How effective is subword regularization and what noise models (e.g., deletion, reordering) are most beneficial for low-resource NMT?
  • RQ4Does the choice of subword segmentation method (e.g., Morfessor vs. SentencePiece) and vocabulary size significantly affect translation quality?
  • RQ5How does language relatedness and data quantity (especially for low-resource languages) influence the effectiveness of cross-lingual transfer?

Key findings

  • Scheduled multi-task learning yielded the highest individual gain of +2.4 BLEU, outperforming both sequential and fully parallel training strategies.
  • Cross-lingual transfer via multilingual training achieved the largest improvement of +12.7 BLEU, demonstrating strong benefits from leveraging high-resource target languages.
  • Back-translation provided up to +4.46 BLEU improvement, confirming its value as a data augmentation technique in low-resource settings.
  • The Morfessor EM+Prune subword segmentation method outperformed SentencePiece by +0.6 BLEU, indicating the benefit of a prior-based segmentation approach.
  • Subword regularization and multi-noise denoising autoencoders improved robustness, especially for rare words, though effects varied by language pair.
  • Even with only 10k sentence pairs, low-resource parallel data produced substantial gains, with diminishing returns beyond that threshold.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.