QUICK REVIEW

[論文レビュー] Self-training with Noisy Student improves ImageNet classification

Qizhe Xie, Minh-Thang Luong|arXiv (Cornell University)|Nov 11, 2019

Advanced Neural Network Applications参考文献 99被引用数 240

ひとこと要約

Noisy Student Training は、教師からの擬似ラベルを用いたより大きなノイズ付きの生徒モデルを用いて、ラベルなしデータを活用し ImageNet の精度とロバスト性を大幅に向上させる。

ABSTRACT

We present Noisy Student Training, a semi-supervised learning approach that works well even when labeled data is abundant. Noisy Student Training achieves 88.4% top-1 accuracy on ImageNet, which is 2.0% better than the state-of-the-art model that requires 3.5B weakly labeled Instagram images. On robustness test sets, it improves ImageNet-A top-1 accuracy from 61.0% to 83.7%, reduces ImageNet-C mean corruption error from 45.7 to 28.3, and reduces ImageNet-P mean flip rate from 27.8 to 12.2. Noisy Student Training extends the idea of self-training and distillation with the use of equal-or-larger student models and noise added to the student during learning. On ImageNet, we first train an EfficientNet model on labeled images and use it as a teacher to generate pseudo labels for 300M unlabeled images. We then train a larger EfficientNet as a student model on the combination of labeled and pseudo labeled images. We iterate this process by putting back the student as the teacher. During the learning of the student, we inject noise such as dropout, stochastic depth, and data augmentation via RandAugment to the student so that the student generalizes better than the teacher. Models are available at https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet. Code is available at https://github.com/google-research/noisystudent.

研究の動機と目的

unlabeled 画像を活用して、ラベル付きデータだけが提供する以上の ImageNet の精度を達成する。
等しいかそれ以上の生徒モデルを用い、ノイズを注入することで、これまでの自己訓練や蒸留を上回る半教師ありフレームワークを開発する。
ImageNet-A、ImageNet-C、ImageNet-P における標準的な ImageNet 指標を超える頑健性の向上を示す。

提案手法

ラベル付きデータで教師を訓練し、ラベルなしデータに擬似ラベルを生成する。
ラベル付きデータと擬似ラベル付きデータの組み合わせにノイズを加えた生徒を訓練する（ RandAugment による入力、ドロップアウトと確率的深さによるモデル）。
最良の生徒を用いて教師を逐次置換し、新しい擬似ラベルを生成して新たな生徒を訓練する。
各クラスごとに unlabeled データの分布をラベル付きデータと整合させるためにデータフィルタリングとバランスを用いる。
ソフトな擬似ラベルとハードな擬似ラベルを比較し、ノイズ成分をアブレーションして影響を示す。

実験結果

リサーチクエスチョン

RQ1強力な教師によってラベル付けされたデータを用いると、状態-of-the-art の監視付き訓練を超える ImageNet の精度が得られるか。
RQ2ノイズを注入し、教師と同等かそれ以上の大きさの生徒を使用することで、擬似ラベルからの学習が改善されるか。
RQ3Noisy Student Training は ImageNet-A、ImageNet-C、ImageNet-P の頑健性にどう影響するか。
RQ4反復的な訓練が最終的な性能にどのような影響を与えるか。
RQ5このフレームワークにおけるソフト vs ハードの擬似ラベルの比較はどうか。

主な発見

モデル	パラメータ数	追加データ	Top-1 精度	Top-5 精度
Noisy Student Training (EfficientNet-L2)	480M	300M unlabeled images from JFT	88.4%	98.7%

Noisy Student Training は 300M の unlabeled 画像を用いることで ImageNet の top-1 精度を 88.4% に達し、より多くの unlabeled データを用いた従来手法を上回る。
頑健性：ImageNet-A の top-1 精度が 61.0% から 83.7% に改善；ImageNet-C の mean 誤差は 45.7 から 28.3 に低下；ImageNet-P の mean フリップ率は 27.8 から 12.2 に低下。
EfficientNet-L2 と Noisy Student Training により ImageNet の top-1 が 88.4%、top-5 が 98.7% に達する（表 2）。
反復訓練（教師 -> 生徒 -> 新しい教師）により、 unlabeled バッチ比を高めると 87.6% -> 88.1% -> 88.4% の top-1 精度を達成。
ノイズは重大：拡張の除去、確率的深さ、ドロップアウトを削除すると性能が低下；大規模な unlabeled データは有益。
Noisy Student Training は FGSM/PGD の敵対的頑健性を改善するが、敵対的頑健性に最適化されていなくても改善が見られる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。