QUICK REVIEW

[論文レビュー] Unsupervised Data Augmentation for Consistency Training

Qizhe Xie, Zihang Dai|arXiv (Cornell University)|Apr 29, 2019

Domain Adaptation and Few-Shot Learning参考文献 74被引用数 1,618

ひとこと要約

UDA は、最先端の教師ありデータ増強をノイズとして一貫性訓練に適用することで、言語とビジョンタスクを横断した半教師あり学習を著しく改善することを示しています。非常に少数のラベル付き例で強い結果を達成し、転移学習と大規模データセットでスケールします。

ABSTRACT

Semi-supervised learning lately has shown much promise in improving deep learning models when labeled data is scarce. Common among recent approaches is the use of consistency training on a large amount of unlabeled data to constrain model predictions to be invariant to input noise. In this work, we present a new perspective on how to effectively noise unlabeled examples and argue that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning. By substituting simple noising operations with advanced data augmentation methods such as RandAugment and back-translation, our method brings substantial improvements across six language and three vision tasks under the same consistency training framework. On the IMDb text classification dataset, with only 20 labeled examples, our method achieves an error rate of 4.20, outperforming the state-of-the-art model trained on 25,000 labeled examples. On a standard semi-supervised learning benchmark, CIFAR-10, our method outperforms all previous approaches and achieves an error rate of 5.43 with only 250 examples. Our method also combines well with transfer learning, e.g., when finetuning from BERT, and yields improvements in high-data regime, such as ImageNet, whether when there is only 10% labeled data or when a full labeled set with 1.3M extra unlabeled examples is used. Code is available at https://github.com/google-research/uda.

研究の動機と目的

高品質なデータ増強がSSLにおける一貫性訓練の効果的なノイズとして機能することを示す。
拡張された未ラベルデータを用いて、複数の言語タスクと視覚タスクにおけるSSLの性能向上を示す。
データ増強の品質がサンプル効率とラベリング要件にどう関連するかを分析する。
BERT などの転移学習コンテキストとの互換性をUD Aで示す。
増強駆動の一貫性が学習を改善する理由について理論的洞察を提供する。

提案手法

ラベル付きデータの教師付きクロスエントロピーと、q( x̂ | x )で拡張された未ラベルデータに対する一貧性損失を組み合わせた半教師付き目的を定式化する。
ノイズ源として、画像には RandAugment のような強力なデータ増強、テキストには逆翻訳のような強力なデータ増強を単純なノイズの代わりに用いる。
一貫性損失を安定化させるために、ターゲット分布を計算するためのパラメータの固定コピーを使用する。
未ラベルデータの利用を改善するために、信頼度ベースのマスキングと予測シャープニングを適用する。
訓練に有用なドメイン外の未ラベルデータを選択するために、ドメイン関連データフィルタリングを組み込む。
オプションとして、BERT 微調整のような転移学習アプローチと UDA を組み合わせる。

実験結果

リサーチクエスチョン

RQ1従来のノイズを最先端のデータ増強で置換することで、SSL の性能は向上しますか？
RQ2増強の品質は、一貫性訓練の有効性とサンプル効率にどのように影響しますか？
RQ3UDA は大規模データセットにもスケールし、NLP および CV における転移学習の恩恵を受けることができますか？
RQ4学習手法（マスキング、シャープニング、ドメインフィルタリング）は UDA の性能をさらに高めますか？
RQ5標準ベンチマークで、主導的な SSL 手法と比較して UDA はどの程度の性能を示しますか？

主な発見

UDA は CIFAR-10 および SVHN において、ラベル付きデータ量の変動に関係なく VAT および MixMatch を一貫して上回る。
視覚で RandAugment を用いたUDA は、はるかに大規模なラベル付きデータで訓練された完全教師ありベースラインに匹敵するか、これを上回る。
テキストでは、UDA が強い利得をもたらし、BERT の事前学習とファインチューニングと効果的に組み合わせる。
UDA は ImageNet へスケールし、10% のラベル付きデータと完全データに加えて外部未ラベルデータを用いる場合に top-1 精度を向上させる。
データ増強の品質は SSL の利得と相関し、より良いノイズがより良い一貫性訓練を生むという仮説を裏付ける。
理論的分析は、必要なラベル付き例が少ないことと、増強グラフにおけるより良い増強接続性を結び付ける。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。