QUICK REVIEW

[論文レビュー] FMix: Enhancing Mixed Sample Data Augmentation

Ethan Harris, Antonia Marcu|arXiv (Cornell University)|Feb 27, 2020

Domain Adaptation and Few-Shot Learning参考文献 62被引用数 104

ひとこと要約

FMix は低周波のフーリエベースのマスクを用いたマスキング混合サンプルデータ拡張を導入し、複数のデータセットとモダリティで MixUp および CutMix を上回る。

ABSTRACT

Mixed Sample Data Augmentation (MSDA) has received increasing attention in recent years, with many successful variants such as MixUp and CutMix. By studying the mutual information between the function learned by a VAE on the original data and on the augmented data we show that MixUp distorts learned functions in a way that CutMix does not. We further demonstrate this by showing that MixUp acts as a form of adversarial training, increasing robustness to attacks such as Deep Fool and Uniform Noise which produce examples similar to those generated by MixUp. We argue that this distortion prevents models from learning about sample specific features in the data, aiding generalisation performance. In contrast, we suggest that CutMix works more like a traditional augmentation, improving performance by preventing memorisation without distorting the data distribution. However, we argue that an MSDA which builds on CutMix to include masks of arbitrary shape, rather than just square, could further prevent memorisation whilst preserving the data distribution in the same way. To this end, we propose FMix, an MSDA that uses random binary masks obtained by applying a threshold to low frequency images sampled from Fourier space. These random masks can take on a wide range of shapes and can be generated for use with one, two, and three dimensional data. FMix improves performance over MixUp and CutMix, without an increase in training time, for a number of models across a range of data sets and problem settings, obtaining a new single model state-of-the-art result on CIFAR-10 without external data. Finally, we show that a consequence of the difference between interpolating MSDA such as MixUp and masking MSDA such as FMix is that the two can be combined to improve performance even further. Code for all experiments is provided at https://github.com/ecs-vlc/FMix .

研究の動機と目的

MSDA のひずみが学習表現と一般化に与える影響を調べる。
情報理論的およびロバストネス分析を用いて、補間型 MSDA（MixUp）とマスキング MSDA（CutMix）を比較する。
データ分布をよりよく保持するために、さまざまなマスク形状を備えた柔軟な masking MSDA である FMix を提案する。
画像、音声、3D 点群タスク全般にわたる FMix の有効性を実証する。

提案手法

実データと拡張データから学習された表現を比較するための VAEs を用いた相互情報量に基づく指標を定義する。
MixUp は学習された関数を歪め、対立的な訓練のように作用する一方、CutMix はデータ情報をより多く保持することを示す。
FMix は低周波数フーリエ空間のサンプルから2値マスクを生成し、閾値処理して多様で局所的に一貫したマスクを作成することを導入する。
FMix のマスキング関数は x_A = M ⊙ x_1 + (1−M) ⊙ x_2 であり、M は低周波画像の閾値処理から得られる。
CIFAR-10/100、Fashion MNIST、Tiny-ImageNet、ImageNet および追加のモダリティ（音声、グラフ文字、3D 点群）で FMix をベースラインと比較して評価する。

実験結果

リサーチクエスチョン

RQ1 masking MSDA は CNN 表現において、補間型 MSDA よりもデータ分布を保持するのか。
RQ2フーリエベースのランダムマスクは、CutMix のような正方形マスクよりも大きく多様な拡張空間を提供できるのか。
RQ3FMix は画像、音声、3D を含む多様なデータモダリティで、MixUp および CutMix と比べてどの程度性能を発揮するのか。
RQ4訓練ポリシーにおいて、 masking MSDAs は補間的 MSDAs を補完するのか。

主な発見

FMix は VAEs に基づく分析において、拡張データと実データの表現間の相互情報量を MixUp や CutMix より高くする。
FMix による拡張データは CNN のデータ分布をより良く保持し、Grad-CAM 分析によりより広範な特徴活用が示唆される。
FMix は CIFAR-10/100、Fashion MNIST、Tiny-ImageNet などでベースラインおよび複数の MSDA 手法に対して分類精度を向上させ、外部データなしでも強力な、あるいは最先端の結果を達成する（例：PyramidNet を用いた CIFAR-10）。
FMix は1次元および3次元データ、および他のモダリティ（音声、グラフ文字、3D 点群）へ拡張され、しばしば MixUp や CutMix よりも優れている。
データが限られている場合、MixUp と FMix を交互に用いるハイブリッドポリシーは、いずれか単独よりも優れた性能を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。