[论文解读] FMix: Enhancing Mixed Sample Data Augmentation
FMix 引入了一种使用低频傅里叶基的遮罩混合样本数据增强,在多个数据集和模态上超越 MixUp 和 CutMix。
Mixed Sample Data Augmentation (MSDA) has received increasing attention in recent years, with many successful variants such as MixUp and CutMix. By studying the mutual information between the function learned by a VAE on the original data and on the augmented data we show that MixUp distorts learned functions in a way that CutMix does not. We further demonstrate this by showing that MixUp acts as a form of adversarial training, increasing robustness to attacks such as Deep Fool and Uniform Noise which produce examples similar to those generated by MixUp. We argue that this distortion prevents models from learning about sample specific features in the data, aiding generalisation performance. In contrast, we suggest that CutMix works more like a traditional augmentation, improving performance by preventing memorisation without distorting the data distribution. However, we argue that an MSDA which builds on CutMix to include masks of arbitrary shape, rather than just square, could further prevent memorisation whilst preserving the data distribution in the same way. To this end, we propose FMix, an MSDA that uses random binary masks obtained by applying a threshold to low frequency images sampled from Fourier space. These random masks can take on a wide range of shapes and can be generated for use with one, two, and three dimensional data. FMix improves performance over MixUp and CutMix, without an increase in training time, for a number of models across a range of data sets and problem settings, obtaining a new single model state-of-the-art result on CIFAR-10 without external data. Finally, we show that a consequence of the difference between interpolating MSDA such as MixUp and masking MSDA such as FMix is that the two can be combined to improve performance even further. Code for all experiments is provided at https://github.com/ecs-vlc/FMix .
研究动机与目标
- 研究 MSDA 扭曲如何影响学习到的表征和泛化。
- 将插值式 MSDA(MixUp)与遮罩式 MSDA(CutMix)进行信息理论和鲁棒性分析比较。
- 提出 FMix,一种灵活的遮罩 MSDA,具有多样的掩码形状,以更好地保持数据分布。
- 在图像、音频和3D点云任务中展示 FMix 的有效性。
提出的方法
- 使用基于变分自编码器(VAEs)的互信息度量,比较来自真实数据与增强数据的表征。
- 显示 MixUp 会扭曲学习函数并起到对抗性训练的作用,而 CutMix 更好地保留了数据信息。
- 通过从低频傅里叶空间样本生成二进制掩码并阈值化来创建多样且局部一致的掩码,从而引入 FMix。
- FMix 的遮罩函数为 x_A = M ⊙ x_1 + (1−M) ⊙ x_2,其中 M 来自阈值化的低频图像。
- 在 CIFAR-10/100、Fashion MNIST、Tiny-ImageNet、ImageNet 以及其他模态(语音、grapheme 和 3D 点云)上评估 FMix 相对于基线的方法。
实验结果
研究问题
- RQ1遮罩式 MSDA 在 CNN 表征中是否比插值式 MSDA 更好地保持数据分布?
- RQ2傅里叶基随机掩码是否能提供比像 CutMix 那样的方形掩码更大、更多样的增强空间?
- RQ3与 MixUp 和 CutMix 相比,FMix 在不同数据模态(图像、音频、3D)上的表现如何?
- RQ4在训练策略中将遮罩式 MSDA 与插值式 MSDA 结合时,它们是否互补?
主要发现
- FMix 在基于 VAE 的分析中,增强数据与真实数据表示之间的互信息高于 MixUp 和 CutMix。
- FMix 产生的增强数据更好地保持了数据分布,CNN 的 Grad-CAM 分析表明使用了更广泛的特征。
- FMix 在 CIFAR-10/100、Fashion MNIST、Tiny-ImageNet 及其他设置上相对于基线和多种 MSDA 方法提升了分类准确率,在不使用外部数据的情况下达到强或最先进的结果(例如 CIFAR-10 与 PyramidNet)。
- FMix 也扩展到一维和三维数据以及其他模态(语音、grapheme 和 3D 点云),通常优于 MixUp 与 CutMix。
- 当训练数据有限时,交替使用 MixUp 和 FMix 的混合策略可以优于任一种单独方法。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。