QUICK REVIEW

[论文解读] Augmented Cyclic Adversarial Learning for Low Resource Domain Adaptation

Ehsan Hosseini-Asl, Yingbo Zhou|arXiv (Cornell University)|Jul 1, 2018

Speech Recognition and Synthesis参考文献 46被引用 35

一句话总结

本文提出了一种增强循环对抗学习（ACAL）框架，该框架通过使用特定任务的模型替代传统的基于重建的循环一致性，实现跨域翻译过程中语义内容的保留。通过利用特定任务的监督作为隐式一致性约束，ACAL在低资源设置下实现了最先进性能，在TIMIT数据集上将手写数字分类准确率提高最多达14%，语音识别的音素错误率降低5%。

ABSTRACT

Training a model to perform a task typically requires a large amount of data from the domains in which the task will be applied. However, it is often the case that data are abundant in some domains but scarce in others. Domain adaptation deals with the challenge of adapting a model trained from a data-rich source domain to perform well in a data-poor target domain. In general, this requires learning plausible mappings between domains. CycleGAN is a powerful framework that efficiently learns to map inputs from one domain to another using adversarial training and a cycle-consistency constraint. However, the conventional approach of enforcing cycle-consistency via reconstruction may be overly restrictive in cases where one or more domains have limited training data. In this paper, we propose an augmented cyclic adversarial learning model that enforces the cycle-consistency constraint via an external task specific model, which encourages the preservation of task-relevant content as opposed to exact reconstruction. We explore digit classification in a low-resource setting in supervised, semi and unsupervised situation, as well as high resource unsupervised. In low-resource supervised setting, the results show that our approach improves absolute performance by 14% and 4% when adapting SVHN to MNIST and vice versa, respectively, which outperforms unsupervised domain adaptation methods that require high-resource unlabeled target domain. Moreover, using only few unsupervised target data, our approach can still outperforms many high-resource unsupervised models. In speech domains, we similarly adopt a speech recognition model from each domain as the task specific model. Our approach improves absolute performance of speech recognition by 2% for female speakers in the TIMIT dataset, where the majority of training samples are from male voices.

研究动机与目标

解决在标注数据稀缺的低资源目标域中领域自适应的挑战。
克服传统CycleGAN的局限性，后者依赖精确重建，在目标数据有限时可能表现不佳。
通过使用特定任务模型替代重建过程，更有效地实现循环一致性约束，从而更有效地保留语义内容。
通过将特定任务模型作为辅助信号用于分布建模，提升判别器在低资源域中的学习能力。
在监督、半监督和无监督设置下，于视觉和语音领域均展示方法的有效性。

提出的方法

将CycleGAN中的标准循环一致性损失替换为一种特定任务损失，用于衡量在每个域上训练的模型的预测准确率。
将特定任务模型作为额外监督信号，用于对应域中判别器的训练，从而改善分布建模。
通过对抗训练训练生成器，实现从源域到目标域的映射及其反向映射，同时确保特定任务模型在循环过程中的输出保持一致。
将特定任务模型整合进循环一致性循环中，有效分离与任务相关的内容与领域特异性风格。
在语音实验中采用多判别器训练，以提升对抗训练的稳定性和性能。
使用频谱图表示语音数据，并采用音素错误率（PER）进行评估。

实验结果

研究问题

RQ1用特定任务损失替代基于重建的循环一致性，是否能提升低资源设置下的领域自适应性能？
RQ2当目标数据稀缺时，将特定任务模型作为辅助信号是否能增强判别器的学习能力？
RQ3当仅有少量目标样本可用时，ACAL与现有无监督领域自适应方法相比表现如何？
RQ4ACAL在低资源条件下是否能泛化至不同领域，如手写数字图像分类和语音识别？
RQ5在低资源自适应中，ACAL在迁移领域风格的同时，能在多大程度上保留语义内容？

主要发现

在低资源监督设置下，ACAL在将SVHN迁移到MNIST时，手写数字分类准确率提升14%；在将MNIST迁移到SVHN时，准确率提升4%。
仅使用少量无监督目标样本时，ACAL的表现优于许多高资源无监督领域自适应模型。
在从SVHN迁移到MNIST时，ACAL在MNIST上的测试准确率达到97.98%，超越了先前的最先进方法。
在语音领域自适应中，ACAL在TIMIT数据集上将男性语音迁移到女性语音时，音素错误率降低5%，接近使用真实女性语音数据训练的模型性能。
当结合额外的无标签数据时，ACAL进一步将PER降低至18.44，显著优于基线模型。
在男性语音到女性语音的迁移中，ACAL模型的性能几乎与使用真实女性语音数据训练的模型相当，表明其在分布迁移中具有高度保真度。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。