QUICK REVIEW

[论文解读] Know Your Self-supervised Learning: A Survey on Image-based Generative and Discriminative Training

Utku Özbulak, Hyun‐Jung Lee|arXiv (Cornell University)|May 23, 2023

Domain Adaptation and Few-Shot Learning被引用 13

一句话总结

本综述回顾基于图像的自监督学习（SSL），涵盖生成式与判别式方法、前缀任务、核心概念、框架、评估、库以及未来方向。

ABSTRACT

Although supervised learning has been highly successful in improving the state-of-the-art in the domain of image-based computer vision in the past, the margin of improvement has diminished significantly in recent years, indicating that a plateau is in sight. Meanwhile, the use of self-supervised learning (SSL) for the purpose of natural language processing (NLP) has seen tremendous successes during the past couple of years, with this new learning paradigm yielding powerful language models. Inspired by the excellent results obtained in the field of NLP, self-supervised methods that rely on clustering, contrastive learning, distillation, and information-maximization, which all fall under the banner of discriminative SSL, have experienced a swift uptake in the area of computer vision. Shortly afterwards, generative SSL frameworks that are mostly based on masked image modeling, complemented and surpassed the results obtained with discriminative SSL. Consequently, within a span of three years, over $100$ unique general-purpose frameworks for generative and discriminative SSL, with a focus on imaging, were proposed. In this survey, we review a plethora of research efforts conducted on image-oriented SSL, providing a historic view and paying attention to best practices as well as useful software packages. While doing so, we discuss pretext tasks for image-based SSL, as well as techniques that are commonly used in image-based SSL. Lastly, to aid researchers who aim at contributing to image-focused SSL, we outline a number of promising research directions.

研究动机与目标

提供跨越生成式与判别式方法的基于图像的自监督学习的历史与技术概述。
总结在图像自监督学习中流行的前缀任务及常见的技术概念。
编年史式地整理最近的SSL框架及其评估方法。
突出显示用于SSL实现的库、数据集和实际注意事项。
识别不足之处和待解决的问题，以指导基于图像的自监督学习的未来研究。

提出的方法

将SSL划分为生成式与判别式框架，并讨论它们各自的目标。
描述流行的基于图像的前缀任务（着色、修复、几何变换、拼图求解、实例识别、掩蔽图像建模）及它们与SSL目标的关系。
呈现跨SSL方法使用的关键架构模式（Siamese 网络、stop-grad、延迟权重更新、投影/预测器 MLPs）以及损失函数（InfoNCE、余弦相似性、MSE、MAE、VICReg、信息最大化）。
解释SSL中的训练/评估范式，包括骨干网络预训练后再进行线性评估，以及内存库、伪标签和蒸馏的作用。
概述SSL中的视觉变换器（ViT），以及MIM等生成性任务如何与基于 Transformer 的骨干网络集成。

实验结果

研究问题

RQ1在SSL中学习有用的图像表征，最有效的前缀任务是什么？
RQ2生成式SSL（如掩蔽图像建模）与判别式SSL（如对比、聚类、蒸馏）在目标、损失和架构上有何差异？
RQ3有哪些常见的损失、架构和训练技巧可以实现对图像的稳健SSL？
RQ4哪些框架、库和实现支持基于图像的SSL研究与应用？
RQ5基于图像的SSL当前存在哪些不足和未解决的问题，以及未来工作的有前景方向？

主要发现

近年来已经提出了超过100个通用图像为焦点的SSL框架，涵盖生成式与判别式方法。
生成式SSL，尤其是掩蔽图像建模，已成为一种强大的范式，在表征学习方面可以超越传统的判别式方法。
判别式SSL通常依赖实例判别、对比损失，以及基于聚类或蒸馏的策略来学习鲁棒特征。
大量前缀任务（着色、修复、几何变换、拼图求解等）支撑SSL，其中某些任务（如MIM）推动了生成式SSL的进展。
训练做法如Siamese 架构、stop-gradient、动量/教师更新、投影/预测器 MLP、内存库和伪标签在整个SSL框架中起着关键作用。
该综述还讨论了评估协议、现有库和代码库，并突出基于图像的SSL中的开放问题与未来研究方向。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。