[Paper Review] Learning multiple visual domains with residual adapters
Proposes residual adapter modules to enable a single network to perform well across ten diverse visual domains, introducing the Visual Decathlon benchmark to evaluate multi-domain representations.
There is a growing interest in learning data representations that work well for many different types of problems and data. In this paper, we look in particular at the task of learning a single visual representation that can be successfully utilized in the analysis of very different types of images, from dog breeds to stop signs and digits. Inspired by recent work on learning networks that predict the parameters of another, we develop a tunable deep network architecture that, by means of adapter residual modules, can be steered on the fly to diverse visual domains. Our method achieves a high degree of parameter sharing while maintaining or even improving the accuracy of domain-specific representations. We also introduce the Visual Decathlon Challenge, a benchmark that evaluates the ability of representations to capture simultaneously ten very different visual domains and measures their ability to recognize well uniformly.
Motivation & Objective
- Develop neural architectures that share parameters across multiple visual domains while enabling domain-specific adaptations.
- Introduce residual adapter modules that add a small number of domain-specific parameters.
- Enable learning without forgetting when adding new domains.
- Assess the approach on ten diverse visual datasets via the Visual Decathlon benchmark.
Proposed method
- Introduce residual adapter modules as small 1x1 filter banks added to residual blocks, enabling domain-specific adaptation with minimal parameter increase.
- Split parameters into domain-agnostic (shared) and domain-specific (adaptation) components.
- Use a low-rank or 1x1 filter-based parameterization to keep domain-specific parameters small (~2(C^2+5C) per domain).
- Incorporate batch normalization scaling and bias parameters as additional domain-dependent components.
- Bootstrap multi-domain learning by pretraining on ImageNet and then training adapters per domain, enabling learning without forgetting.
- Optionally enable end-to-end learning by cycling data from all domains during training to refine shared parameters.
Experimental results
Research questions
- RQ1Can a compact set of domain-specific adapter parameters enable effective multi-domain learning across very different visual domains?
- RQ2How does the proposed residual adapter approach compare to standard fine-tuning, feature freezing, and other baselines in single-model multi-domain performance?
- RQ3Does the method preserve performance on a large source domain (e.g., ImageNet) while adapting to multiple target domains?
- RQ4What is the impact of adapter size, regularization, and domain-prediction accuracy on overall multi-domain performance?
- RQ5How does the Visual Decathlon benchmark reveal strengths and weaknesses of multi-domain representations?
Key findings
- Residual adapters enable high parameter sharing with strong domain-specific performance across ten domains.
- The approach achieves competitive mean accuracy and favorable decathlon scores versus baselines that fine-tune all parameters or train separate models.
- Adapter-based methods show no forgetting on the original domain while performing well on target domains.
- Tuning only adapter parameters can outperform full fine-tuning and other baselines on several domains.
- End-to-end learning and domain-prediction have additional benefits, achieving strong results with minimal performance loss on the original domain.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.