QUICK REVIEW

[論文レビュー] Know Your Self-supervised Learning: A Survey on Image-based Generative and Discriminative Training

Utku Özbulak, Hyun‐Jung Lee|arXiv (Cornell University)|May 23, 2023

Domain Adaptation and Few-Shot Learning被引用数 13

ひとこと要約

このサーベイは、生成的および識別的アプローチ、プレテxtタスク、コア概念、フレームワーク、評価、ライブラリ、そして将来の方向性を網羅する、画像ベースの自己教師付き学習（SSL）の調査です。

ABSTRACT

Although supervised learning has been highly successful in improving the state-of-the-art in the domain of image-based computer vision in the past, the margin of improvement has diminished significantly in recent years, indicating that a plateau is in sight. Meanwhile, the use of self-supervised learning (SSL) for the purpose of natural language processing (NLP) has seen tremendous successes during the past couple of years, with this new learning paradigm yielding powerful language models. Inspired by the excellent results obtained in the field of NLP, self-supervised methods that rely on clustering, contrastive learning, distillation, and information-maximization, which all fall under the banner of discriminative SSL, have experienced a swift uptake in the area of computer vision. Shortly afterwards, generative SSL frameworks that are mostly based on masked image modeling, complemented and surpassed the results obtained with discriminative SSL. Consequently, within a span of three years, over $100$ unique general-purpose frameworks for generative and discriminative SSL, with a focus on imaging, were proposed. In this survey, we review a plethora of research efforts conducted on image-oriented SSL, providing a historic view and paying attention to best practices as well as useful software packages. While doing so, we discuss pretext tasks for image-based SSL, as well as techniques that are commonly used in image-based SSL. Lastly, to aid researchers who aim at contributing to image-focused SSL, we outline a number of promising research directions.

研究の動機と目的

画像ベースのSSLの生成的および識別的手法を横断する歴史的・技術的概要を提供する。
画像のSSLで使用される人気のプレテxtタスクと一般的な技術概念を要約する。
最近のSSLフレームワークとその評価手法を chronicle する。
SSL実装のためのライブラリ、データセット、実用的考慮事項を強調する。
画像ベースのSSLにおける欠点と未解決問題を特定し、今後の研究を指針とする。

提案手法

SSLを生成的フレームワークと識別的フレームワークに分類し、それぞれの目標を論じる。
人気のある画像ベースのプレテxtタスク（colorization、inpainting、geometric transformations、puzzle solvers、instance discrimination、masked image modeling）と、それらとSSL目的との関係を説明する。
Siameseネットワーク、stop-grad、遅延重み更新、projection/predictor MLPs）と、SSLメソッド全体で使用される損失関数（InfoNCE、cosine similarity、MSE、MAE、VICReg、information-maximization）の重要なアーキテクチャパターンを提示する。
バックボーン事前学習 followed by linear evaluation、 memory banks、pseudo-labeling、distillation の役割を含む、SSLのトレーニング/評価パラダイムを説明する。
SSLにおける視覚トランスフォーマー（ViT）の概要と、MIMや他の生成タスクがTransformerベースのバックボーンとどのように統合されるかを説明する。

実験結果

リサーチクエスチョン

RQ1SSLで有用な画像表現を学ぶための最も効果的なプレテxtタスクは何か。
RQ2生成的SSL（例：masked image modeling）と識別的SSL（例：コントラスト、クラスタリング、蒸留）は、目標、損失、アーキテクチャの点でどのように異なるのか。
RQ3画像の堅牢なSSLを可能にする共通の損失、アーキテクチャ、トレーニングのコツは何か。
RQ4画像ベースのSSL研究と応用を支援するフレームワーク、ライブラリ、実装は何か。
RQ5画像ベースのSSLの現状の欠点と未解決問題、将来の有望な方向性は何か。

主な発見

近年、一般目的の画像中心SSLフレームワークが100以上提案されており、生成的アプローチと識別的アプローチの両方を網羅している。
特にmasked image modelingを中心とする生成的SSLは、表現学習において従来の識別的手法を凌駕する力強いパラダイムとして出現した。
識別的SSLはしばしばインスタンス識別、コントラスト損失、およびクラスタリングベースまたは蒸馏ベースの戦略に依存して、頑健な特徴を学習する。
様々なプレテxtタスク（colorization、inpainting、geometric transformations、puzzle solving等）がSSLの基盤を成しており、特定のタスク（例：MIM）が生成的SSLの進展を牽引している。
Siameseアーキテクチャ、stop-gradient、モメンタム/teacher更新、projection/predictor MLPs、メモリバンク、pseudo-labelingといったトレーニング実践が、SSLフレームワーク全体で重要な役割を果たしている。
このサーベイは評価プロトコル、既存のライブラリとリポジトリ、画像ベースSSLの課題と将来の研究方向性も論じている。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。