Skip to main content
QUICK REVIEW

[論文レビュー] Big Self-Supervised Models are Strong Semi-Supervised Learners

Ting Chen, Simon Kornblith|arXiv (Cornell University)|Jun 17, 2020
Domain Adaptation and Few-Shot Learning参考文献 66被引用数 476
ひとこと要約

この論文はSimCLRv2を提案する。三段階の半教師ありフレームワーク(大規模モデルを用いた無監督事前学習、少数ラベルでの監督付き微調整、未ラベルデータを用いたディスティレーション)を用い、ラベルが非常に少ない場合でもImageNetで最先端の性能を達成する。例えばResNet-50でディスティレーション後、1%ラベルでTop-1 73.9%、10%ラベルでTop-1 77.5%を達成。

ABSTRACT

One paradigm for learning from few labeled examples while making best use of a large amount of unlabeled data is unsupervised pretraining followed by supervised fine-tuning. Although this paradigm uses unlabeled data in a task-agnostic way, in contrast to common approaches to semi-supervised learning for computer vision, we show that it is surprisingly effective for semi-supervised learning on ImageNet. A key ingredient of our approach is the use of big (deep and wide) networks during pretraining and fine-tuning. We find that, the fewer the labels, the more this approach (task-agnostic use of unlabeled data) benefits from a bigger network. After fine-tuning, the big network can be further improved and distilled into a much smaller one with little loss in classification accuracy by using the unlabeled examples for a second time, but in a task-specific way. The proposed semi-supervised learning algorithm can be summarized in three steps: unsupervised pretraining of a big ResNet model using SimCLRv2, supervised fine-tuning on a few labeled examples, and distillation with unlabeled examples for refining and transferring the task-specific knowledge. This procedure achieves 73.9% ImageNet top-1 accuracy with just 1% of the labels ($\le$13 labeled images per class) using ResNet-50, a $10 imes$ improvement in label efficiency over the previous state-of-the-art. With 10% of labels, ResNet-50 trained with our method achieves 77.5% top-1 accuracy, outperforming standard supervised training with all of the labels.

研究の動機と目的

  • Motivate and evaluate task-agnostic unlabeled data use during pretraining for semi-supervised learning in computer vision.
  • Investigate the impact of model size, depth, and projection head design on semi-supervised performance.
  • Demonstrate how distillation using unlabeled data transfers task-specific knowledge to smaller models.
  • Show that a bigger, self-supervised pretraining model improves label efficiency during fine-tuning.

提案手法

  • Adopt SimCLRv2, an improved contrastive learning framework for unsupervised pretraining on a big ResNet backbone.
  • Fine-tune the pretrained model on limited labeled data (1% or 10%) with a middle-layer projection head to boost performance.
  • Apply distillation using unlabeled data where a teacher (fine-tuned model) imputes labels for a student, enabling task-specific knowledge transfer.
  • Experiment with larger/deeper networks, selective kernels (SK), and a deeper projection head to optimize both linear evaluation and fine-tuning performance.
  • Use a memory bank (from MoCo) and a 3-layer MLP projection head during pretraining; fine-tuning from the projection head’s middle layer; distillation loss without relying on ground-truth labels (temperature tuning).
  • Report results on ImageNet with 1%, 10%, and full-label settings; compare against prior SOTA semi-supervised methods.

実験結果

リサーチクエスチョン

  • RQ1Does unsupervised pretraining with bigger, wider models yield improved semi-supervised performance on ImageNet when labeled data is scarce?
  • RQ2How do projection head depth and the point from which fine-tuning starts affect semi-supervised learning performance?
  • RQ3Can distillation with unlabeled data improve task-specific performance and transfer to smaller models without labeled data?

主な発見

MethodArchitectureTop-1 (1%)Top-5 (1%)Top-1 (10%)Top-5 (10%)
Supervised baseline [30]ResNet-5025.456.448.480.4
SimCLRv2 distilled (ours)ResNet-5073.977.591.593.4
SimCLRv2 distilled (ours)ResNet-50 (2x + SK)75.980.293.095.0
SimCLRv2 self-distilled (ours)ResNet-152 (3x + SK)76.680.993.495.5
  • Bigger self-supervised models yield larger gains when fine-tuned with fewer labels, improving label efficiency significantly.
  • Projection head depth and fine-tuning from middle layers can substantially boost performance, especially with limited labels.
  • Distillation using unlabeled data improves semi-supervised learning; big-to-small distillation transfers task knowledge to compact models.
  • SimCLRv2 linear evaluation reaches 79.8% top-1 accuracy; with 1% and 10% labels and distillation, 76.6% and 80.9% top-1 are achieved respectively; distilled ResNet-50 attains 73.9% (1%) and 77.5% (10%).
  • Compared to supervised ResNet-50 trained on all labels (76.6% top-1), the proposed method delivers substantial gains under label scarcity.
  • Distillation with unlabeled data can yield strong performance even when the student shares similar architecture to the teacher, enabling efficient deployment.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。