QUICK REVIEW

[Paper Review] Learning Transferable Features with Deep Adaptation Networks

Mingsheng Long, Yue Cao|arXiv (Cornell University)|Feb 10, 2015

Domain Adaptation and Few-Shot Learning35 references2,837 citations

TL;DR

DAN introduces multi-layer, multi-kernel MK-MMD based domain adaptation to deep networks, aligning source and target feature distributions in higher layers to improve transferability and achieve state-of-the-art results on standard benchmarks.

ABSTRACT

Recent studies reveal that a deep neural network can learn transferable features which generalize well to novel tasks for domain adaptation. However, as deep features eventually transition from general to specific along the network, the feature transferability drops significantly in higher layers with increasing domain discrepancy. Hence, it is important to formally reduce the dataset bias and enhance the transferability in task-specific layers. In this paper, we propose a new Deep Adaptation Network (DAN) architecture, which generalizes deep convolutional neural network to the domain adaptation scenario. In DAN, hidden representations of all task-specific layers are embedded in a reproducing kernel Hilbert space where the mean embeddings of different domain distributions can be explicitly matched. The domain discrepancy is further reduced using an optimal multi-kernel selection method for mean embedding matching. DAN can learn transferable features with statistical guarantees, and can scale linearly by unbiased estimate of kernel embedding. Extensive empirical evidence shows that the proposed architecture yields state-of-the-art image classification error rates on standard domain adaptation benchmarks.

Motivation & Objective

Motivate the need to reduce dataset bias and enhance transferability in deep networks for unsupervised and semi-supervised domain adaptation.
Propose a deep adaptation architecture (DAN) that embeds task-specific layer representations into an RKHS and matches mean embeddings across domains.
Develop a multi-kernel MK-MMD strategy to select kernels optimally for distribution matching.
Enable scalable training with a linear-time unbiased estimator of kernel mean embeddings.
Demonstrate empirical performance gains over state-of-the-art methods on standard domain adaptation benchmarks.

Proposed method

Embed hidden representations of task-specific layers into a reproducing kernel Hilbert space (RKHS) to match mean embeddings across domains.
Use a multi-kernel MK-MMD to measure and minimize domain discrepancy between source and target layer representations.
Apply an unbiased linear-time MK-MMD estimator to enable scalable training with mini-batch SGD.
Fine-tune a pre-trained AlexNet model by freezing early convolutional layers and adapting higher layers with MK-MMD regularization (l1=6 to l2=8).
Optimize kernel coefficients via a quadratic program to maximize test power and minimize Type II error (alternating with θ-optimization).
Provide a theoretical bound linking target risk to source risk plus domain discrepancy quantified by MK-MMD.

Experimental results

Research questions

RQ1How can multiple deep network layers be adapted to reduce domain discrepancy between source and target domains?
RQ2Can a multi-kernel MK-MMD approach improve the effectiveness of distribution matching in deep representations compared to single-kernel methods?
RQ3Does integrating MK-MMD as a regularizer in deep networks yield scalable, empirically superior domain adaptation?
RQ4What are the empirical gains of DAN on standard domain adaptation benchmarks relative to prior methods?

Key findings

DAN achieves state-of-the-art accuracy on Office-31 unsupervised domain adaptation tasks, e.g., DAN reaches 68.5-68.9% in various tasks (averages 72.9-72.9% depending on variant).
Multi-layer adaptation (fc7–fc8) outperforms single-layer variants, with DAN (multi-layer MK-MMD) surpassing single-kernel and single-layer baselines such as DDC.
Multi-kernel MK-MMD (DAN) consistently improves performance over single-kernel variants and other baselines across transfer tasks.
The method demonstrates robust performance on Office-31 and Office-10 + Caltech-10 benchmarks, outperforming TCA, GFK, CNN-based approaches, and prior domain adaptation methods.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.