QUICK REVIEW

[论文解读] DADA: Differentiable Automatic Data Augmentation

Yonggang Li, Guosheng Hu|arXiv (Cornell University)|Mar 8, 2020

Advanced Neural Network Applications参考文献 30被引用 39

一句话总结

DADA 提出一个可微分框架，用以学习数据增强策略，采用无偏 RELAX 梯度估计器，在比之前 Auto-DA 方法快至少一个数量级的搜索速度的同时保持有竞争力的准确率。它使用 Gumbel-Softmax 放宽策略采样，并通过一次性双层优化来联合训练网络和增强参数。

ABSTRACT

Data augmentation (DA) techniques aim to increase data variability, and thus train deep networks with better generalisation. The pioneering AutoAugment automated the search for optimal DA policies with reinforcement learning. However, AutoAugment is extremely computationally expensive, limiting its wide applicability. Followup works such as Population Based Augmentation (PBA) and Fast AutoAugment improved efficiency, but their optimization speed remains a bottleneck. In this paper, we propose Differentiable Automatic Data Augmentation (DADA) which dramatically reduces the cost. DADA relaxes the discrete DA policy selection to a differentiable optimization problem via Gumbel-Softmax. In addition, we introduce an unbiased gradient estimator, RELAX, leading to an efficient and effective one-pass optimization strategy to learn an efficient and accurate DA policy. We conduct extensive experiments on CIFAR-10, CIFAR-100, SVHN, and ImageNet datasets. Furthermore, we demonstrate the value of Auto DA in pre-training for downstream detection problems. Results show our DADA is at least one order of magnitude faster than the state-of-the-art while achieving very comparable accuracy. The code is available at https://github.com/VDIGPKU/DADA.

研究动机与目标

在标注数据有限时，激励自动数据增强（DA）策略学习以提高泛化能力。
提出一个可微分的 DA 策略搜索公式，以实现与网络权重的联合优化。
相较于 AutoAugment、PBA 以及 Fast AutoAugment，降低 DA 策略搜索的计算成本。

提出的方法

用类别分布表示子策略的选择，用伯努利分布表示操作的应用。
使用 Gumbel-Softmax 放宽离散策略选择，以实现可微分优化。
使用 RELAX 梯度估计器为分布参数获得无偏梯度。
应用一次性双层优化来联合更新网络权重和 DA 策略参数。
用直通梯度估计和基于梯度的反向传播来评估增强幅度。

实验结果

研究问题

RQ1通过 Gumbel-Softmax 和 RELAX 的可微分优化，是否能够实现数据增强策略与网络权重的高效联合学习？
RQ2DADA 是否在显著降低搜索成本的同时达到与最先进 Auto-DA 方法相当的准确性？
RQ3DADA 对大规模数据集（ImageNet）及下游任务（目标检测）的迁移能力如何？

主要发现

DADA 在保持竞争力的准确率的同时，相比最先进的 DA 方法实现了至少一个数量级的速度提升。
在 ImageNet 上，DADA 的搜索时间为 1.3 GPU 小时，顶1误差为 22.5%（ResNet-50）。
对于 CIFAR-10/100 和 SVHN，DADA 以大幅降低的搜索成本提供有竞争力的错误率（例如在降维数据上的 CIFAR-10 搜索约 0.1 GPU 小时）。
使用 RELAX 比 Gumbel-Softmax 能减少梯度估计中的偏差，从而提升 CIFAR-10 上的策略表现。
DADA 学习的 DA 策略提升了在 COCO 上的下游检测模型（RetinaNet、Faster R-CNN、Mask R-CNN）的性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。