QUICK REVIEW

[論文レビュー] Big Self-Supervised Models are Strong Semi-Supervised Learners

Ting Chen, Simon Kornblith|arXiv (Cornell University)|Jun 17, 2020

Domain Adaptation and Few-Shot Learning参考文献 66被引用数 476

ひとこと要約

この論文はSimCLRv2を提案する。三段階の半教師ありフレームワーク（大規模モデルを用いた無監督事前学習、少数ラベルでの監督付き微調整、未ラベルデータを用いたディスティレーション）を用い、ラベルが非常に少ない場合でもImageNetで最先端の性能を達成する。例えばResNet-50でディスティレーション後、1%ラベルでTop-1 73.9%、10%ラベルでTop-1 77.5%を達成。

ABSTRACT

One paradigm for learning from few labeled examples while making best use of a large amount of unlabeled data is unsupervised pretraining followed by supervised fine-tuning. Although this paradigm uses unlabeled data in a task-agnostic way, in contrast to common approaches to semi-supervised learning for computer vision, we show that it is surprisingly effective for semi-supervised learning on ImageNet. A key ingredient of our approach is the use of big (deep and wide) networks during pretraining and fine-tuning. We find that, the fewer the labels, the more this approach (task-agnostic use of unlabeled data) benefits from a bigger network. After fine-tuning, the big network can be further improved and distilled into a much smaller one with little loss in classification accuracy by using the unlabeled examples for a second time, but in a task-specific way. The proposed semi-supervised learning algorithm can be summarized in three steps: unsupervised pretraining of a big ResNet model using SimCLRv2, supervised fine-tuning on a few labeled examples, and distillation with unlabeled examples for refining and transferring the task-specific knowledge. This procedure achieves 73.9% ImageNet top-1 accuracy with just 1% of the labels ($\le$13 labeled images per class) using ResNet-50, a $10 imes$ improvement in label efficiency over the previous state-of-the-art. With 10% of labels, ResNet-50 trained with our method achieves 77.5% top-1 accuracy, outperforming standard supervised training with all of the labels.

研究の動機と目的

Motivate and evaluate task-agnostic unlabeled data use during pretraining for semi-supervised learning in computer vision.
Investigate the impact of model size, depth, and projection head design on semi-supervised performance.
Demonstrate how distillation using unlabeled data transfers task-specific knowledge to smaller models.
Show that a bigger, self-supervised pretraining model improves label efficiency during fine-tuning.

提案手法

Adopt SimCLRv2, an improved contrastive learning framework for unsupervised pretraining on a big ResNet backbone.
Fine-tune the pretrained model on limited labeled data (1% or 10%) with a middle-layer projection head to boost performance.
Apply distillation using unlabeled data where a teacher (fine-tuned model) imputes labels for a student, enabling task-specific knowledge transfer.
Experiment with larger/deeper networks, selective kernels (SK), and a deeper projection head to optimize both linear evaluation and fine-tuning performance.
Use a memory bank (from MoCo) and a 3-layer MLP projection head during pretraining; fine-tuning from the projection head’s middle layer; distillation loss without relying on ground-truth labels (temperature tuning).
Report results on ImageNet with 1%, 10%, and full-label settings; compare against prior SOTA semi-supervised methods.

実験結果

リサーチクエスチョン

RQ1Does unsupervised pretraining with bigger, wider models yield improved semi-supervised performance on ImageNet when labeled data is scarce?
RQ2How do projection head depth and the point from which fine-tuning starts affect semi-supervised learning performance?
RQ3Can distillation with unlabeled data improve task-specific performance and transfer to smaller models without labeled data?

主な発見

Method	Architecture	Top-1 (1%)	Top-5 (1%)	Top-1 (10%)	Top-5 (10%)
Supervised baseline [30]	ResNet-50	25.4	56.4	48.4	80.4
SimCLRv2 distilled (ours)	ResNet-50	73.9	77.5	91.5	93.4
SimCLRv2 distilled (ours)	ResNet-50 (2x + SK)	75.9	80.2	93.0	95.0
SimCLRv2 self-distilled (ours)	ResNet-152 (3x + SK)	76.6	80.9	93.4	95.5

Bigger self-supervised models yield larger gains when fine-tuned with fewer labels, improving label efficiency significantly.
Projection head depth and fine-tuning from middle layers can substantially boost performance, especially with limited labels.
Distillation using unlabeled data improves semi-supervised learning; big-to-small distillation transfers task knowledge to compact models.
SimCLRv2 linear evaluation reaches 79.8% top-1 accuracy; with 1% and 10% labels and distillation, 76.6% and 80.9% top-1 are achieved respectively; distilled ResNet-50 attains 73.9% (1%) and 77.5% (10%).
Compared to supervised ResNet-50 trained on all labels (76.6% top-1), the proposed method delivers substantial gains under label scarcity.
Distillation with unlabeled data can yield strong performance even when the student shares similar architecture to the teacher, enabling efficient deployment.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。