QUICK REVIEW

[論文レビュー] Improving robustness against common corruptions by covariate shift adaptation

Steffen Schneider, Evgenia Rusak|arXiv (Cornell University)|Jun 30, 2020

Domain Adaptation and Few-Shot Learning参考文献 61被引用数 161

ひとこと要約

この論文は、ラベルなしの破損画像に対してバッチ正規化の統計を適応させることで、さまざまなモデルにおける一般的な破損への頑健性が大幅に向上することを示し、評価のバリエーションとシンプルな Wasserstein 距離ベースの分析を提案します。

ABSTRACT

Today's state-of-the-art machine vision models are vulnerable to image corruptions like blurring or compression artefacts, limiting their performance in many real-world applications. We here argue that popular benchmarks to measure model robustness against common corruptions (like ImageNet-C) underestimate model robustness in many (but not all) application scenarios. The key insight is that in many scenarios, multiple unlabeled examples of the corruptions are available and can be used for unsupervised online adaptation. Replacing the activation statistics estimated by batch normalization on the training set with the statistics of the corrupted images consistently improves the robustness across 25 different popular computer vision models. Using the corrected statistics, ResNet-50 reaches 62.2% mCE on ImageNet-C compared to 76.7% without adaptation. With the more robust DeepAugment+AugMix model, we improve the state of the art achieved by a ResNet50 model up to date from 53.6% mCE to 45.4% mCE. Even adapting to a single sample improves robustness for the ResNet-50 and AugMix models, and 32 samples are sufficient to improve the current state of the art for a ResNet-50 architecture. We argue that results with adapted statistics should be included whenever reporting scores in corruption benchmarks and other out-of-distribution generalization settings.

研究の動機と目的

頑健性ベンチマークがラベルなしの破損データが利用可能な場合、現実の性能を過小評価する可能性があることを動機づける。
ラベルなしの破損サンプルから統計を用いずしてバッチ正規化の統計を適応させ、共変量シフトを低減する。
幅広いアーキテクチャとデータセットにわたる頑健性の向上を実証する。
適応に必要なサンプル数と時期の実用的な指針を提供する。
共変量シフトと分解の Wasserstein 距離を結びつける指標と理論的洞察を導入する。

提案手法

unlabeled 破損サンプルからターゲット統計を計算し、トレーニング統計と擬似サンプルサイズパラメータ N を用いて組み合わせ，BN 統計を適応する。
ad hoc（n=1）、partial（n=8）、full（n=50,000）適応シナリオで mean corruption error（mCE）を用いて頑健性を評価する。
ImageNet-C 上で 25 アーキテクチャを対象に評価し、最先端の頑健性手法と比較する。
ソースとターゲット統計間の Wasserstein 距離を用いて共変量シフトと性能の関係を分析する。
BN 適応が有効でないケース（例：IN-A、ON）を探り、GN / Fixup などの代替法と比較する。
適応パラメータと共変量シフト関連の劣化を結ぶ簡易な上限モデルを提供する。

実験結果

リサーチクエスチョン

RQ1ラベルなし破損データ上での BN 統計の適応は、さまざまなアーキテクチャに対して一般的な破損への頑健性を改善するか？
RQ2適応データ量（擬似サンプル数 N とサンプル数 n）は頑健性の向上にどう影響するか？
RQ3ImageNet-C に限らず、破損タイプやデータセットを跨いで改善は一貫して現れるか？
RQ4BN 統計で捉えられる共変量シフトは、source と target の分布間の Wasserstein 距離で特徴付け・予測できるか？
RQ5BN 適応が失敗・劣後する場面は、従来の正規化法以外の頑健性手法と比べてどこに現れるか？

主な発見

Model	IN-C mCE w/o adapt	IN-C mCE partial adapt	IN-C mCE full adapt	∆ adapt (mCE)	Top-1 w/o adapt	Top-1 partial adapt	Top-1 full adapt	∆ adapt (Top-1)
Vanilla ResNet-50	76.7	65.0	62.2	-14.5	39.2	48.6	50.7	+11.5
SIN	69.3	61.5	59.5	-9.8	45.2	51.6	53.1	+7.9
ANT	63.4	56.1	53.6	-9.8	50.4	56.1	58.0	+7.6
DeepAug+AM	53.6	48.4	45.4	-8.2	58.1	62.2	64.5	+6.4
DeepAug+AM+RNXt101	44.5	40.7	38.0	-6.6	65.2	68.2	70.3	+5.1

BN 統計の適応は、25 アーキテクチャにおいて mCE が約 10 ポイント程度向上するなど、顕著な頑健性の向上をもたらす。
Vanilla ResNet-50 の場合、適応は mCE を 76.7% から 62.2%（full 適応）または 65.0%（partial 適応）へ低下させる。
単一サンプルへの適応でも性能が向上する場合がある（例：N≈0, n=1 で mCE が 76.7% から 71.4% に向上）。
ResNet-50 の IN-C に対する最先端の頑健性は BN 適応によって上回れることがあり、DeepAugment+AugMix は mCE を 53.6% から 45.4% に改善（非適応時 53.6% から）された。
25 のモデルファミリに渡って、BN 適応は安定して mCE を改善し、通常は約 10 ポイント程度向上する。大規模な事前学習（例：IG-3.5B）は適応の必要性を減らす場合がある。
source と target の BN 統計間の Wasserstein 距離は、適応前後の top-1 エラーと相関し、教師なしの性能推定を可能にする。
データセットによって適応の有効性は異なり、IN-A および ObjectNet では学習済み特徴のシフトが異なるか、BN 非依存の頑健性が有利である。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。