QUICK REVIEW

[论文解读] Regularizing Deep Networks with Semantic Data Augmentation

Yulin Wang, Gao Huang|arXiv (Cornell University)|Jul 21, 2020

Advanced Neural Network Applications参考文献 84被引用 23

一句话总结

本文提出隐式语义数据增强（ISDA），一种高效方法，通过在深层特征空间中进行语义变换隐式增强训练数据，从而正则化深度网络。通过从类别条件协方差矩阵中采样方向，并将增强过程建模为鲁棒交叉熵损失，ISDA 在无需训练辅助模型或显式生成增强样本的情况下，显著提升了 ResNets、DenseNets 以及多个数据集（包括 CIFAR-10、CIFAR-100、SVHN、ImageNet 和 Cityscapes）上的泛化能力。

ABSTRACT

Data augmentation is widely known as a simple yet surprisingly effective technique for regularizing deep networks. Conventional data augmentation schemes, e.g., flipping, translation or rotation, are low-level, data-independent and class-agnostic operations, leading to limited diversity for augmented samples. To this end, we propose a novel semantic data augmentation algorithm to complement traditional approaches. The proposed method is inspired by the intriguing property that deep networks are effective in learning linearized features, i.e., certain directions in the deep feature space correspond to meaningful semantic transformations, e.g., changing the background or view angle of an object. Based on this observation, translating training samples along many such directions in the feature space can effectively augment the dataset for more diversity. To implement this idea, we first introduce a sampling based method to obtain semantically meaningful directions efficiently. Then, an upper bound of the expected cross-entropy (CE) loss on the augmented training set is derived by assuming the number of augmented samples goes to infinity, yielding a highly efficient algorithm. In fact, we show that the proposed implicit semantic data augmentation (ISDA) algorithm amounts to minimizing a novel robust CE loss, which adds minimal extra computational cost to a normal training procedure. In addition to supervised learning, ISDA can be applied to semi-supervised learning tasks under the consistency regularization framework, where ISDA amounts to minimizing the upper bound of the expected KL-divergence between the augmented features and the original features. Although being simple, ISDA consistently improves the generalization performance of popular deep models (e.g., ResNets and DenseNets) on a variety of datasets, i.e., CIFAR-10, CIFAR-100, SVHN, ImageNet, and Cityscapes.

研究动机与目标

解决传统数据增强技术的局限性，这些技术仅应用低层次、与类别无关的变换（如旋转或翻转），导致数据多样性不足。
克服现有语义增强方法的高计算成本和复杂性，这些方法依赖为每个类别训练生成模型（如 GAN）来实现增强。
开发一种方法，通过利用深层特征空间中的线性化语义方向，隐式执行有意义的语义数据增强。
实现在标准训练流程中高效集成，无需修改网络架构或增加额外推理步骤。
将该方法扩展至一致性正则化框架下的半监督学习，以极低的计算开销提升模型鲁棒性与性能。

提出的方法

从动态估计的类别条件协方差矩阵出发，从均值为零的正态分布中采样随机向量，以识别深层特征空间中的语义有意义方向。
将增强数据集上的期望交叉熵损失建模为上界，并在训练过程中隐式最小化该上界，从而避免显式生成数据。
推导出一种新型鲁棒交叉熵损失函数，通过增强样本上期望损失的上界隐式正则化模型。
利用每类特征的协方差矩阵引导语义方向的采样，捕捉类别特定的语义变化，如物体纹理或背景的变化。
通过修改损失函数，将 ISDA 集成到监督和半监督学习中，无需辅助网络或显式数据生成。
以即插即用方式应用：兼容任何使用 Softmax 交叉熵损失的深度网络，仅需调整少量超参数。

实验结果

研究问题

RQ1我们能否在不训练或推理辅助生成模型的前提下实现有效的语义数据增强？
RQ2我们能否仅利用特征统计信息，在不显式生成数据的情况下，隐式模拟深层特征空间中的语义变换？
RQ3使用类别条件协方差矩阵是否能产生比随机采样或全局协方差采样更具意义和有效性的语义方向？
RQ4在小样本量下，隐式语义增强与显式增强相比，在泛化能力和鲁棒性方面表现如何？
RQ5ISDA 能否在一致性正则化框架下有效扩展至半监督学习？其性能是否在计算开销极低的情况下得到提升？

主要发现

ISDA 在 CIFAR-10、CIFAR-100、SVHN、ImageNet 和 Cityscapes 上均达到当前最优性能，持续提升 ResNets 和 DenseNets 的泛化能力。
在使用 Wide-ResNet-28-10 的 CIFAR-100 上，ISDA 将测试误差降低至 16.95% ± 0.11%，相比基线提升 1.63 个百分点。
消融实验表明，使用对角矩阵或单位矩阵会降低性能，而使用单一全局协方差矩阵则损害泛化能力，证实了类别条件统计信息的重要性。
当 λ₀ = 0.5 时，ISDA 在多个数据集和设置下均表现出稳健性能，最优性能出现在 0.25 ≤ λ₀ ≤ 1 范围内。
当 M 较小时（如 M=1,2,5），显式语义数据增强性能较差，这是由于特征空间估计不佳所致；但随着 M 增大，性能逐渐提升，当 M → ∞ 时趋近于 ISDA 的性能。
在半监督学习中，ISDA 与 VAT 结合在仅使用 4,000 个标签的 CIFAR-10 上显著降低错误率，证明其在低资源场景下的有效性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。