QUICK REVIEW

[论文解读] Central Moment Discrepancy (CMD) for Domain-Invariant Representation Learning

Werner Zellinger, Thomas Grubinger|arXiv (Cornell University)|Feb 28, 2017

Domain Adaptation and Few-Shot Learning参考文献 33被引用 214

一句话总结

CMD 引入了一种新的领域正则化器，明确匹配隐藏激活的高阶中心矩，以产生领域不变表示，在 Office 和 Amazon 评论基准测试上实现最先进的结果，且不需要基于核的 MMD 计算。

ABSTRACT

The learning of domain-invariant representations in the context of domain adaptation with neural networks is considered. We propose a new regularization method that minimizes the discrepancy between domain-specific latent feature representations directly in the hidden activation space. Although some standard distribution matching approaches exist that can be interpreted as the matching of weighted sums of moments, e.g. Maximum Mean Discrepancy (MMD), an explicit order-wise matching of higher order moments has not been considered before. We propose to match the higher order central moments of probability distributions by means of order-wise moment differences. Our model does not require computationally expensive distance and kernel matrix computations. We utilize the equivalent representation of probability distributions by moment sequences to define a new distance function, called Central Moment Discrepancy (CMD). We prove that CMD is a metric on the set of probability distributions on a compact interval. We further prove that convergence of probability distributions on compact intervals w.r.t. the new metric implies convergence in distribution of the respective random variables. We test our approach on two different benchmark data sets for object recognition (Office) and sentiment analysis of product reviews (Amazon reviews). CMD achieves a new state-of-the-art performance on most domain adaptation tasks of Office and outperforms networks trained with MMD, Variational Fair Autoencoders and Domain Adversarial Neural Networks on Amazon reviews. In addition, a post-hoc parameter sensitivity analysis shows that the new approach is stable w.r.t. parameter changes in a certain interval. The source code of the experiments is publicly available.

研究动机与目标

通过对齐神经网络中的领域特定潜在表示来推动并解决无监督领域自适应。
引入 Central Moment Discrepancy (CMD) 作为对隐藏激活分布的基于矩的距离。
提供理论保证，CMD 是紧凑区间上分布的度量，并且蕴含分布收敛。
在 Office（计算机视觉）和 Amazon 评论（情感分析）基准上展示经验性能提升。
展示 CMD 对超参数选择的鲁棒性以及相较于基于核的方法的计算负担下降。

提出的方法

将 CMD 定义为基于跨越所有阶的中心矩差异的分布距离，使用一个经验的、可处理的 CMD_K 近似。
证明 CMD 在紧区间上的概率分布是一个度量，且 CMD 收敛蕴含分布收敛。
将 CMD_K 作为正则化项集成到领域自适应目标函数中，连同标准损失，避免核矩阵运算。
使用梯度下降优化，将 CMD_K 添加到损失中，以在源标签和目标域激活上训练神经网络。
将 K（矩次序）设为一个较小的固定值（如 5），以在捕获信息与计算效率之间取得平衡。
在 Amazon 评论和 Office 数据集上将 CMD 与 MMD、MKL、VFAE 和 DANN 进行比较。

实验结果

研究问题

RQ1将高阶中心矩的顺序匹配显式应用于领域特定激活，是否能优于一次矩匹配，从而改进领域不变表示学习？
RQ2CMD 是否是有效的度量，且 CMD 收敛是否保证紧区间分布的收敛？
RQ3CMD_K 是否在计算上比基于核的方法（如 MMD）具有优势，同时保持或提升领域自适应性能？
RQ4在标准领域自适应基准（Office 和 Amazon 评论）上使用 CMD_K 相对于现有方法的经验影响如何？
RQ5CMD 对矩次序参数的选择以及其他超参数的敏感程度有多高？

主要发现

CMD 在若干 Office 领域自适应任务上达到最先进性能。
在大多数 Amazon 评论任务中，CMD 的表现优于 MMD、VFAE 和 DANN。
CMD 在精度方面具有竞争力或优于，且计算更简单、线性时间，相较于二次时间的 MMD。
结果显示 CMD 对参数变化在实际区间内（K 约为 5）是稳定的。
理论结果确立了 CMD 作为度量，并表明 CMD 收敛蕴含紧区间边缘分布的收敛。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。