QUICK REVIEW

[论文解读] Understanding and Enhancing Mixed Sample Data Augmentation.

Ethan Harris, Antonia Marcu|arXiv (Cornell University)|Feb 27, 2020

Advanced Neural Network Applications参考文献 15被引用 21

一句话总结

本文提出FMix，一种新颖的混合样本数据增强方法，通过从低频傅里叶空间图像中生成随机二值掩码，创建多样化、非方形的掩码形状。与会扭曲特征表示并具有对抗性质的MixUp不同，FMix在保持数据分布的同时防止记忆化，无需外部数据即可在CIFAR-10上实现最先进性能，且训练时间未增加，优于MixUp和CutMix。

ABSTRACT

Mixed Sample Data Augmentation (MSDA) has received increasing attention in recent years, with many successful variants such as MixUp and CutMix. By studying the mutual information between the function learned by a VAE on the original data and on the augmented data we show that MixUp distorts learned functions in a way that CutMix does not. We further demonstrate this by showing that MixUp acts as a form of adversarial training, increasing robustness to attacks such as Deep Fool and Uniform Noise which produce examples similar to those generated by MixUp. We argue that this distortion prevents models from learning about sample specific features in the data, aiding generalisation performance. In contrast, we suggest that CutMix works more like a traditional augmentation, improving performance by preventing memorisation without distorting the data distribution. However, we argue that an MSDA which builds on CutMix to include masks of arbitrary shape, rather than just square, could further prevent memorisation whilst preserving the data distribution in the same way. To this end, we propose FMix, an MSDA that uses random binary masks obtained by applying a threshold to low frequency images sampled from Fourier space. These random masks can take on a wide range of shapes and can be generated for use with one, two, and three dimensional data. FMix improves performance over MixUp and CutMix, without an increase in training time, for a number of models across a range of data sets and problem settings, obtaining a new single model state-of-the-art result on CIFAR-10 without external data. Finally, we show that a consequence of the difference between interpolating MSDA such as MixUp and masking MSDA such as FMix is that the two can be combined to improve performance even further. Code for all experiments is provided at this https URL .

研究动机与目标

研究像MixUp和CutMix这样的混合样本数据增强（MSDA）方法如何影响深度模型的表征学习。
识别为何MixUp可能扭曲学习到的函数并阻碍样本特定特征的学习，而CutMix则避免此类扭曲。
开发一种新的MSDA方法，结合CutMix的优点——保持数据分布并防止记忆化——同时实现任意形状的掩码以提升泛化能力。
提出并评估FMix，一种通过阈值化低频傅里叶变换生成随机二值掩码的方法，适用于1D、2D和3D数据。
证明FMix在多个数据集和模型上优于MixUp和CutMix，且将基于插值的MSDA与基于掩码的MSDA结合可进一步提升性能。

提出的方法

FMix通过从傅里叶空间采样低频图像并应用阈值，生成随机二值掩码，形成不规则、非方形的形状。
该方法采用可微采样过程，确保训练期间梯度能够反向传播，支持使用标准反向传播的端到端学习。
掩码通过逐元素乘法和插值应用于输入数据和标签，类似于CutMix，但得益于基于傅里叶的生成方式，具有形状多样性。
由于采用频域采样机制，该方法可推广至1D、2D和3D数据，包括图像、音频和视频。
FMix被设计为比MixUp更好地保持数据分布，避免对特征表示造成类似对抗性的扭曲。
该方法可与基于插值的MSDA（如MixUp）结合使用，表明两者在性能上具有互补性。

实验结果

研究问题

RQ1MixUp如何影响变分自编码器（VAE）学习到的表征函数？是否扭曲了潜在数据分布？
RQ2为何CutMix能提升泛化能力而不扭曲特征表示，而MixUp则不能？
RQ3基于掩码的MSDA若采用任意形状掩码，是否能进一步减少记忆化，同时保持数据分布？
RQ4FMix是否使用傅里叶采样掩码，在多种数据集和模型上优于现有MSDA方法（如MixUp和CutMix）？
RQ5将基于插值的（MixUp）和基于掩码的（FMix）MSDA结合，是否能带来进一步的性能提升？

主要发现

FMix在不使用外部数据的情况下，于CIFAR-10上实现了新的单模型最先进性能，优于MixUp和CutMix。
FMix在不增加训练时间的前提下，提升了多种模型和数据集上的泛化性能。
研究发现，MixUp会扭曲学习到的函数，其作用类似于对抗训练，增强了对Deep Fool和Uniform Noise等攻击的鲁棒性。
CutMix不会扭曲数据分布，且比MixUp更有效地防止记忆化，因此更适合标准数据增强任务。
MixUp与FMix的结合带来了进一步的性能提升，表明基于插值和基于掩码的MSDA具有互补优势。
FMix使用傅里叶采样掩码，可生成多样化、非方形的掩码形状，从而在保持数据分布保真度的同时增强泛化能力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。