QUICK REVIEW

[论文解读] Dual Supervised Learning

Yingce Xia, Tao Qin|arXiv (Cornell University)|Jul 3, 2017

Natural Language Processing Techniques被引用 72

一句话总结

Dual Supervised Learning（DSL）通过强制概率性对偶性同时训练原任务和对偶任务，在翻译、图像处理和情感分析等领域提升性能。

ABSTRACT

Many supervised learning tasks are emerged in dual forms, e.g., English-to-French translation vs. French-to-English translation, speech recognition vs. text to speech, and image classification vs. image generation. Two dual tasks have intrinsic connections with each other due to the probabilistic correlation between their models. This connection is, however, not effectively utilized today, since people usually train the models of two dual tasks separately and independently. In this work, we propose training the models of two dual tasks simultaneously, and explicitly exploiting the probabilistic correlation between them to regularize the training process. For ease of reference, we call the proposed approach \emph{dual supervised learning}. We demonstrate that dual supervised learning can improve the practical performances of both tasks, for various applications including machine translation, image processing, and sentiment analysis.

研究动机与目标

利用成对任务之间的内在对偶性（例如 A→B 与 B→A）来同时改进两个任务的动机。
提出一个约束最优化问题的形式化，该问题在原始模型和对偶模型之间强制概率性对偶性。
开发一种利用对偶性正则化项的实用算法，可通过拉格朗日乘子求解。
在机器翻译、图像处理和情感分析中展示 DSL 的有效性。
分析 DSL 如何作为数据相关的正则化以及对泛化能力的影响。

提出的方法

用条件分布 P(y|x;θ_xy) 和 P(x|y;θ_yx) 来定义原始任务和对偶任务。
引入概率性对偶性约束 P(x)P(y|x)=P(y)P(x|y)，并用基于拉格朗日的正则化项松弛该约束。
在小批量上最小化标准损失和对偶性正则化项的加权和。
通过语言模型或类别分布估计边际分布 ŜP(x) 和 ŜP(y) 以计算正则化。
使用标准优化器（如 SGD、Adam）联合训练两个模型，并用 lambda 超参数控制对偶性正则化。

实验结果

研究问题

RQ1利用成对任务之间的概率性对偶性是否能够同时提升两者的性能？
RQ2如何将原始模型与对偶模型之间的对偶性纳入一个实际的训练目标？
RQ3对翻译质量、图像分类/生成和情感分析的对偶正则化有何影响？
RQ4边际分布如何为 DSL 的对偶正则化提供信息并稳定化？

主要发现

任务	RNNSearch	DSL	Δ
En → Fr	29.92	31.99	2.07
Fr → En	27.49	28.35	0.86
En → De	16.54	17.91	1.37
De → En	20.69	20.81	0.12
En → Zh (MT08)	15.45	15.87	0.42
Zh → En (MT08)	31.67	33.59	1.92
En → Zh (MT12)	15.05	16.10	1.05
Zh → En (MT12)	30.54	32.00	1.46

DSL 在三个应用领域均有提升：翻译（En↔Fr、En↔De、En↔Zh 的 BLEU 提升）、图像分类（错误率下降）、图像生成（每维比特数下降）。
在 En↔Fr 翻译中，DSL 获得 BLEU 增益 +2.07（En→Fr）和 +0.86（Fr→En）。
在 En↔De 翻译中，DSL 获得 BLEU 增益 +1.37（En→De）和 +0.12（De→En）。
在 En↔Zh 翻译中，DSL 获得 BLEU 增益 +0.42（En→Zh MT08）、+1.92（Zh→En MT08）、+1.05（En→Zh MT12）、+1.46（Zh→En MT12）。
在 CIFAR-10 上，DSL 将 ResNet-110 的错误率从 6.43 降至 5.40，并改进 PixelCNN++ 生成（ResNet-110 的 state-of-the-art bpd 2.93）。
在情感分析（IMDB）中，DSL 将分类错误率降低了 0.90 点，并略微改善困惑度。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。