[论文解读] Colorization as a Proxy Task for Visual Understanding
本文展示自监督着色作为替代 ImageNet 的直接预训练方法,在没有 ImageNet 标签的情况下实现了最先进的 VOC 结果,并提供对损失、架构与训练选择的透彻分析。
We investigate and improve self-supervision as a drop-in replacement for ImageNet pretraining, focusing on automatic colorization as the proxy task. Self-supervised training has been shown to be more promising for utilizing unlabeled data than other, traditional unsupervised learning methods. We build on this success and evaluate the ability of our self-supervised network in several contexts. On VOC segmentation and classification tasks, we present results that are state-of-the-art among methods not using ImageNet labels for pretraining representations. Moreover, we present the first in-depth analysis of self-supervision via colorization, concluding that formulation of the loss, training details and network architecture play important roles in its effectiveness. This investigation is further expanded by revisiting the ImageNet pretraining paradigm, asking questions such as: How much training data is needed? How many labels are needed? How much do features change when fine-tuned? We relate these questions back to self-supervision by showing that colorization provides a similarly powerful supervisory signal as various flavors of ImageNet pretraining.
研究动机与目标
- Motivate the use of self-supervised learning to leverage unlabeled data for visual understanding.
- Investigate colorization as a proxy task for learning transferable visual representations.
- Evaluate colorization-based pretraining on VOC classification and segmentation benchmarks.
- Analyze how loss formulation, architecture, and training details affect learned representations.
提出的方法
- Train a colorization network that predicts color from grayscale using L*a*b space and a histogram-based hue/chroma loss.
- Use hypercolumns with sparse training to learn representations efficiently.
- Pretrain on 3.7M unlabeled images (ImageNet + Places205) and transfer to downstream tasks.
- Systematically compare colorization pretraining to ImageNet pretraining across architectures and data regimes.
- Explore training details such as learning rate schedules, receptive field enlargement, and batch normalization handling.
实验结果
研究问题
- RQ1Can self-supervised colorization match or approach supervised ImageNet pretraining on VOC classification and segmentation?
- RQ2How do loss formulation and architectural choices influence the quality of learned representations?
- RQ3What is the impact of pretraining data size and label diversity on downstream performance?
- RQ4How does colorization-derived representation shift during fine-tuning compared to purely supervised pretraining?
主要发现
- Colorization-based pretraining achieves 60.0% mIU on VOC 2012 Segmentation with ResNet-152 and extended field of view, the highest reported without ImageNet labels.
- For VOC 2007 Classification, colorization pretraining reaches 77.3% mAP, state-of-the-art among non-ImageNet methods.
- Predicting color histograms in hue/chroma space yields better downstream results (52.9% mIU) than regression on color values (48.0% mIU).
- Increasing model complexity (AlexNet → VGG-16 → ResNet-152) yields larger gains with colorization pretraining, especially in small-sample regimes.
- Colorization features show substantial feature shift during fine-tuning, indicating learned representations are not merely a good initialization but are repurposed for downstream tasks.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。