QUICK REVIEW

[论文解读] Unsupervised Domain Adaptation via Structurally Regularized Deep Clustering

Hui Tang, Ke Chen|arXiv (Cornell University)|Mar 19, 2020

Domain Adaptation and Few-Shot Learning参考文献 66被引用 24

一句话总结

本文提出了一种新型无监督域自适应方法——结构正则化深度聚类（SRDC），该方法通过深度聚类直接揭示目标数据的内在判别性，同时利用源域标签作为结构正则化。SRDC在无需显式域对齐的情况下，通过联合训练网络以最小化预测分布与源正则化标签分布之间的KL散度，同时通过中间特征聚类增强判别性，并软选择差异较小的源样本，从而在三个无监督域自适应（UDA）基准上取得了最先进性能。

ABSTRACT

Unsupervised domain adaptation (UDA) is to make predictions for unlabeled data on a target domain, given labeled data on a source domain whose distribution shifts from the target one. Mainstream UDA methods learn aligned features between the two domains, such that a classifier trained on the source features can be readily applied to the target ones. However, such a transferring strategy has a potential risk of damaging the intrinsic discrimination of target data. To alleviate this risk, we are motivated by the assumption of structural domain similarity, and propose to directly uncover the intrinsic target discrimination via discriminative clustering of target data. We constrain the clustering solutions using structural source regularization that hinges on our assumed structural domain similarity. Technically, we use a flexible framework of deep network based discriminative clustering that minimizes the KL divergence between predictive label distribution of the network and an introduced auxiliary one; replacing the auxiliary distribution with that formed by ground-truth labels of source data implements the structural source regularization via a simple strategy of joint network training. We term our proposed method as Structurally Regularized Deep Clustering (SRDC), where we also enhance target discrimination with clustering of intermediate network features, and enhance structural regularization with soft selection of less divergent source examples. Careful ablation studies show the efficacy of our proposed SRDC. Notably, with no explicit domain alignment, SRDC outperforms all existing methods on three UDA benchmarks.

研究动机与目标

解决现有无监督域自适应（UDA）方法依赖显式域对齐时可能破坏数据内在判别性的风险。
探究在源域与目标域具有结构相似性的假设下，是否可通过判别性聚类直接揭示目标数据的内在结构。
开发一种方法，利用源域标签作为结构正则化，指导目标聚类，而无需进行特征级对齐。
通过聚类中间网络特征来提升目标判别性，并通过软选择差异较小的源样本以增强正则化效果。
证明基于聚类的方法可在无需显式域对齐的情况下，优于基于对齐的最先进方法。

提出的方法

SRDC采用灵活的深度聚类框架，通过最小化网络预测标签分布与辅助分布之间的KL散度来优化。
通过将辅助分布替换为源域的真实标签分布来实现结构化源正则化，从而实现与源域和目标域数据的联合训练。
通过对中间层特征进行聚类来增强目标判别性，从而捕获更具判别性的表示。
基于特征相似性，通过软选择与目标域差异较小的源样本，进一步提升结构正则化效果。
该方法联合优化聚类与分类目标，在不进行显式域对齐的同时保留了数据的内在结构。
框架采用单个网络端到端训练，以KL散度最小化为核心优化目标。

实验结果

研究问题

RQ1在无需显式域对齐的情况下，能否通过深度聚类有效揭示目标数据的内在判别性？
RQ2将源域标签作为结构正则化是否能提升无监督域自适应中的目标聚类性能？
RQ3对差异较小的源样本进行软选择如何影响结构正则化的鲁棒性与准确性？
RQ4与仅聚类最后一层特征相比，聚类中间网络特征是否能提升目标域的判别性？
RQ5基于聚类的UDA方法在准确率与泛化能力方面是否优于最先进的基于对齐的方法？

主要发现

SRDC在Office-31基准上达到最先进性能，平均准确率达到90.8%。
在ImageCLEF-DA上，SRDC达到90.9%的平均准确率，较之前最先进方法（SymNets）高出2.0个百分点。
在Office-Home上，SRDC达到71.3%的平均准确率，显著优于之前最先进方法（MDD为68.1%），高出3.2个百分点。
消融实验表明，中间特征聚类与源样本的软选择均对性能提升有显著贡献。
SRDC在无需任何显式域对齐的情况下取得这些结果，证明了内在数据结构可在UDA中被有效利用。
在Office-31的所有域迁移设置（如A→W、D→W、W→D）中，SRDC均表现出一致的性能提升，显示出强大的泛化能力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。