QUICK REVIEW

[論文レビュー] Learning From Noisy Large-Scale Datasets With Minimal Supervision

Andreas Veit, Neil Alldrin|arXiv (Cornell University)|Jan 6, 2017

Domain Adaptation and Few-Shot Learning参考文献 27被引用数 54

ひとこと要約

この論文は、小さな検証済みサブセットを使ってノイズの多い大規模画像アノテーションをクリーンアップし、ロバストな多ラベル分類器を共同訓練する半教師ありマルチタスクモデルを提示し、Open Imagesの直接ファインチューニングよりも性能が良い。

ABSTRACT

We present an approach to effectively use millions of images with noisy annotations in conjunction with a small subset of cleanly-annotated images to learn powerful image representations. One common approach to combine clean and noisy data is to first pre-train a network using the large noisy dataset and then fine-tune with the clean dataset. We show this approach does not fully leverage the information contained in the clean set. Thus, we demonstrate how to use the clean annotations to reduce the noise in the large dataset before fine-tuning the network using both the clean set and the full set with reduced noise. The approach comprises a multi-task network that jointly learns to clean noisy annotations and to accurately classify images. We evaluate our approach on the recently released Open Images dataset, containing ~9 million images, multiple annotations per image and over 6000 unique classes. For the small clean set of annotations we use a quarter of the validation set with ~40k images. Our results demonstrate that the proposed approach clearly outperforms direct fine-tuning across all major categories of classes in the Open Image dataset. Further, our approach is particularly effective for a large number of classes with wide range of noise in annotations (20-80% false positive annotations).

研究の動機と目的

ほとんどのアノテーションがノイズに満ちているか弱教師付きである場合に、ロバストな多ラベル分類器を学習する動機付け。
画像特徴に条件付けしてノイズラベルをクリーンラベルに写像するラベルクリーニングネットワークの提案。
ノイズ付きおよびクリーンなアノテーションの両方を活用するため、ラベルクリーニングと画像分類を共同最適化。
大規模でノイズの多いデータセットに対して、従来のファインチューニングよりも性能の向上を示す。

提案手法

視覚特徴を共有するラベルクリーニングネットワーク g と画像分類器 h を備えたマルチタスク構造を導入する。
gをノイズラベル y からクリーンラベル c_hat への残差写像として、画像特徴 f(I) に条件付けしてモデル化する。
検証済みラベル v をもつ小さな集合 V を用いてクリーンラベルを予測するよう g を訓練し、L_clean = sum_i |c_hat_i − v_i| とする。
h を、ターゲットとして c_hat（T から）または v（V から）を用いて画像ラベルを予測するよう訓練し、クロスエントロピーを用いた L_classify で行う。
損失を L_clean の重み 0.1 と L_classify の重み 1.0 でバランスさせ、バッチ構成を 9:1（T:V）とする。
Inception-v3 のバックボーンと、マルチラベル分類のための 6012-ウェイシグmoid最終層を用いる。

実験結果

リサーチクエスチョン

RQ1ノイズの多い大規模データセットで、クリーンラベルマッピングを少数の検証済みセットから学習して多ラベル分類のノイズを軽減できるか？
RQ2ラベルクリーニングと画像分類を共同訓練することは、クリーンラベルへの直接ファインチューニングや混合ラベルのファインチューニングよりも優れているか？
RQ3提案手法は、大規模データセットのラベル頻度やアノテーション品質の違いを通じてどう性能を発揮するか？
RQ4クリーニングネットワークの事前学習と共同訓練の実用性・性能に与える影響はどのようなものか？

主な発見

モデル	AP_all	MAP
Baseline	83.82	61.82
Misra et al. (visual classifier)	83.55	61.85
Misra et al. (relevance classifier)	83.79	61.89
Fine-Tuning with mixed labels	84.80	61.90
Fine-Tuning with clean labels	85.88	61.53
Our Approach with pre-training	87.68	62.36
Our Approach trained jointly	87.67	62.38

提案手法は、主要な Open Images カテゴリおよび全体指標で直接ファインチューニングを上回る。
平均適合率（MAP）は、 joint training で 62.38 となり、ベースライン 61.82（事前学習で最大 62.36 まで）を上回る。
クリーンラベルだけでのファインチューニングは過学習に陥り MAP が低下する可能性があるが、我々の手法は一般的なクラスと希少クラスの両方で利益を維持。
20%–80% の偽ラベルを含むクラスでより大きな利得が得られ、ノイズの多いラベルに対する頑健性を示す。
高レベルカテゴリ（車両、製品、芸術、人物、スポーツ、食品、動物、植物）全体で性能向上が一貫している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。