QUICK REVIEW

[论文解读] Data Efficient Direct Speech-to-Text Translation with Modality Agnostic Meta-Learning

Sathish Reddy Indurthi, HouJeung Han|arXiv (Cornell University)|Nov 11, 2019

Natural Language Processing Techniques参考文献 26被引用 28

一句话总结

本文提出了一种模态无关的元学习方法，以提升端到端语音到文本翻译（ST）中的数据效率，利用预训练的自动语音识别（ASR）和机器翻译（MT）任务作为源任务，学习鲁棒的模型初始化。通过在语音和文本模态之间应用模型无关元学习（MAML），该方法在MuST-C En-De和En-Fr ST任务上取得了最先进结果，相较于先前的迁移学习方法，BLEU分数分别提升了9.18和11.76分。

ABSTRACT

End-to-end Speech Translation (ST) models have several advantages such as lower latency, smaller model size, and less error compounding over conventional pipelines that combine Automatic Speech Recognition (ASR) and text Machine Translation (MT) models. However, collecting large amounts of parallel data for ST task is more difficult compared to the ASR and MT tasks. Previous studies have proposed the use of transfer learning approaches to overcome the above difficulty. These approaches benefit from weakly supervised training data, such as ASR speech-to-transcript or MT text-to-text translation pairs. However, the parameters in these models are updated independently of each task, which may lead to sub-optimal solutions. In this work, we adopt a meta-learning algorithm to train a modality agnostic multi-task model that transfers knowledge from source tasks=ASR+MT to target task=ST where ST task severely lacks data. In the meta-learning phase, the parameters of the model are exposed to vast amounts of speech transcripts (e.g., English ASR) and text translations (e.g., English-German MT). During this phase, parameters are updated in such a way to understand speech, text representations, the relation between them, as well as act as a good initialization point for the target ST task. We evaluate the proposed meta-learning approach for ST tasks on English-German (En-De) and English-French (En-Fr) language pairs from the Multilingual Speech Translation Corpus (MuST-C). Our method outperforms the previous transfer learning approaches and sets new state-of-the-art results for En-De and En-Fr ST tasks by obtaining 9.18, and 11.76 BLEU point improvements, respectively.

研究动机与目标

为解决端到端语音翻译（ST）系统中平行语音到文本数据有限的挑战。
克服迁移学习中性能次优的问题，即模型参数在各任务上独立更新，而未考虑对目标ST任务的适应。
开发一个统一框架，利用来自ASR和MT任务的多样化数据，且在源任务与目标任务之间不共享参数。
通过跨模态元学习学习强初始化，以提升低资源ST任务的泛化能力和微调效率。

提出的方法

采用模型无关元学习（MAML）在源任务（自动语音识别（ASR）和机器翻译（MT））上训练多任务模型。
在元学习过程中，使用语音转录本（ASR）和文本翻译对（MT）作为输入模态，以学习共享的、模态无关的初始化。
通过微调期间的少量梯度步骤实现快速适应，使模型能够快速适应目标ST任务。
将元学习得到的初始化应用于ST模型，且在ASR、MT和ST任务之间不共享参数。
使用WordPiece分词和合成数据增强进一步提升性能。
采用序列到序列架构，结合自注意力机制和前馈网络进行优化，训练目标为对数似然。

实验结果

研究问题

RQ1元学习能否有效应用于ASR（语音输入）和MT（文本输入）等跨模态任务，以改善目标ST任务的初始化？
RQ2模态无关的元学习方法是否在低资源语音到文本翻译中优于标准迁移学习？
RQ3与现有迁移学习基线相比，元学习模型在MuST-C等标准ST基准上的表现如何？
RQ4合成数据和WordPiece分词在多大程度上能进一步提升元学习ST系统的性能？
RQ5所提出的方法是否能在不依赖任务特定参数共享的情况下，泛化到不同语言对？

主要发现

所提方法在MuST-C英语-德语（En-De）语音翻译任务上实现了最先进性能，BLEU得分为22.11。
在英语-法语（En-Fr）任务上，该方法取得34.05的BLEU得分，创下新的SOTA记录。
相较于先前的迁移学习方法，该方法在En-De和En-Fr任务上分别将BLEU分数提升了9.18和11.76分。
使用合成数据和WordPiece分词进一步提升了性能，证明了该框架与数据增强技术的良好兼容性。
消融实验表明，与标准微调相比，元学习策略能实现更快且更有效的ST任务适应。
该模型在不共享源任务与目标任务参数的情况下取得优异结果，验证了模态无关初始化的有效性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。