QUICK REVIEW

[论文解读] VisDA: The Visual Domain Adaptation Challenge

Xingchao Peng, Ben Usman|arXiv (Cornell University)|Oct 18, 2017

Domain Adaptation and Few-Shot Learning参考文献 49被引用 575

一句话总结

介绍 VisDA2017 大规模合成到真实的无监督领域自适应基准，用于图像分类和语义分割，基线结果和挑战性方法显示域自适应带来可观提升。

ABSTRACT

We present the 2017 Visual Domain Adaptation (VisDA) dataset and challenge, a large-scale testbed for unsupervised domain adaptation across visual domains. Unsupervised domain adaptation aims to solve the real-world problem of domain shift, where machine learning models trained on one domain must be transferred and adapted to a novel visual domain without additional supervision. The VisDA2017 challenge is focused on the simulation-to-reality shift and has two associated tasks: image classification and image segmentation. The goal in both tracks is to first train a model on simulated, synthetic data in the source domain and then adapt it to perform well on real image data in the unlabeled test domain. Our dataset is the largest one to date for cross-domain object classification, with over 280K images across 12 categories in the combined training, validation and testing domains. The image segmentation dataset is also large-scale with over 30K images across 18 categories in the three domains. We compare VisDA to existing cross-domain adaptation datasets and provide a baseline performance analysis using various domain adaptation models that are currently popular in the field.

研究动机与目标

通过评估从合成图像到真实图像的无监督领域自适应（UDA）来解决领域偏移问题。
为对象分类和语义分割提供一个大规模、跨域的基准。
在开发鲁棒的UDA方法时，减少对目标域标签或有监督的预训练的依赖。
提供基线和挑战结果以推动跨域视觉识别的进步。

提出的方法

构建一个大规模的 VisDA-C 分类数据集，包含 152,397 张合成训练图像，以及来自 COCO 与 YouTube-BB、覆盖 12 个类别的真实验证/测试图像。
将基线 CNN（AlexNet、ResNet/ResNext 变体）与 UDA 方法如 Deep Adaptation Network (DAN) 和 Deep CORAL 进行对比。
实现基于 MMD 的及二阶统计对齐等无监督领域自适应方法。
提供两个目标域（验证集：MS COCO；测试集：YouTube Bounding Boxes）以防止在测试集上进行超参数调优。
将基准扩展到 VisDA-S 语义分割，从 GTA5（合成）到 CityScapes（真实），以 Nexar 作为测试域。
提供基线和挑战结果以展示域自适应带来的提升，并促使更鲁棒的 UDA 方法的发展。

实验结果

研究问题

RQ1模型在合成数据上训练后，在未标注的真实目标域上进行图像分类时的适应能力如何？
RQ2与源域单独基线相比，标准 UDA 方法（如 DAN、Deep CORAL）在 VisDA-C 上取得了怎样的提升？
RQ3在从合成到真实的转变中，语义分割的无监督域自适应表现如何（GTA5 到 CityScapes，Nexar 测试）？
RQ4哪些设计选择（验证划分、缺乏目标标签、预训练依赖）会影响跨域自适应的难度和结果？
RQ5有哪些方向可以提高任务难度，以进一步测试 UDA 方法的能力？

主要发现

VisDA-C 包含超过 280K 张图像，覆盖 12 个类别，包含合成训练数据与真实验证/测试数据。
在合成到真实的任务上，源域单独的 AlexNet 骗光准确率下降至 28.12%，凸显了显著的域偏移。
DAN 将验证准确率提高到 51.62%，Deep CORAL 提高到 45.53%，超过源域单独基线。
顶级挑战结果显示通过半监督和教师-学生策略实现更大提升（例如 GFColourLabUEA 在测试集上最高达到 92.8%）。
在 VisDA-S 语义分割中，从 GTA5 到 CityScapes 的适应将验证集的平均 IoU 从 21.6（源域）提升到 25.5（自适应），在 Nexar 测试域也取得了具竞争力的结果。
本文强调减少对监督性预训练（如 ImageNet）的依赖，以体现没有同域预训练的现实部署场景。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。