QUICK REVIEW

[论文解读] VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning

Adrien Bardes, Jean Ponce|arXiv (Cornell University)|May 11, 2021

Domain Adaptation and Few-Shot Learning被引用 285

一句话总结

VICReg 引入了一种简单、模块化的自监督学习损失，包含三个要素——方差保持、不变性和协方差去相关——以防止崩溃，而不需要共享权重、批量归一化或记忆库。

ABSTRACT

Recent self-supervised methods for image representation learning are based on maximizing the agreement between embedding vectors from different views of the same image. A trivial solution is obtained when the encoder outputs constant vectors. This collapse problem is often avoided through implicit biases in the learning architecture, that often lack a clear justification or interpretation. In this paper, we introduce VICReg (Variance-Invariance-Covariance Regularization), a method that explicitly avoids the collapse problem with a simple regularization term on the variance of the embeddings along each dimension individually. VICReg combines the variance term with a decorrelation mechanism based on redundancy reduction and covariance regularization, and achieves results on par with the state of the art on several downstream tasks. In addition, we show that incorporating our new variance term into other methods helps stabilize the training and leads to performance improvements.

研究动机与目标

激发并解决联合嵌入自监督学习中的表征崩溃。
提出一种轻量级、非对比的损失及三个正则化项以保持信息内容。
证明 VICReg 能在异构架构和输入（包括多模态设置）下工作。
展示方差项可以稳定训练并提升下游任务性能。

提出的方法

在嵌入上定义一个三项损失：不变性（两视图嵌入之间的距离）、方差（对每一维的批量标准差施加阈值以避免崩溃）以及协方差（惩罚非对角协方差以去相关嵌入维度）
将方差和协方差正则化独立应用于联结嵌入网络的每个分支（可能不对称）。
不需要权重共享、批量归一化、记忆库或对比负样本；使用类孪生结构，在编码器之上加一个灵活的扩展器。
通过随机数据增强训练以为每个图像创建两个视图，在编码器和扩展器参数上进行优化。
提供实现细节，包括损失系数、网络结构（ResNet-50 编码器，3 层扩展器，隐藏单元 8192），以及优化计划（LARS、余弦衰减）。
展示对多模态预训练（图像-文本）的适用性以及向下游任务的迁移能力（ImageNet 线性/半监督、检测、分割和检索）。

实验结果

研究问题

RQ1一个非对比的联合嵌入目标是否能够在不依赖记忆库或大批量的情况下防止崩溃？
RQ2明确的方差保持加上协方差去相关是否足以在多样的下游任务中匹配最先进的自监督表示？
RQ3在不进行权重共享或使用相同架构的前提下，VICReg 下的非对称或多模态嵌入设置是否可行？
RQ4方差正则化是否会提高 VICReg 及其他自监督方法的训练稳定性？
RQ5VICReg 在 ImageNet 及迁移任务上相对于对比学习和聚类驱动的自监督方法的表现如何？

主要发现

方法	线性 Top-1	线性 Top-5	1% Top-1	1% Top-5	10% Top-1	10% Top-5
有监督	76.5	-	25.4	56.4	48.4	80.4
MoCo (He et al. 2020)	60.6	-	-	-	-	-
PIRL (Misra & Maaten 2020)	63.6	-	-	-	-	57.2	83.8
CPC v2 (Hénaff et al. 2019)	63.8	-	-	-	-	-
CMC (Tian et al. 2019)	66.2	-	-	-	-	-
SimCLR (Chen et al. 2020a)	69.3	89.0	48.3	65.6	75.5	87.8
MoCo v2 (Chen et al. 2020c)	71.1	-	-	-	-	-
SimSiam (Chen & He 2020)	71.3	-	-	-	-	-
SwAV (Caron et al. 2020)	71.8	-	-	-	-	-
InfoMin (Tian et al. 2020)	73.0	91.1	-	-	-	-
Barlow Twins (Zbontar et al. 2021)	73.2	91.0	55.0	69.7	79.2	89.3
VICReg (ours)	73.2	91.1	54.8	69.5	79.4	89.5

VICReg 在不使用负样本、记忆库或归一化要求的情况下，在 ImageNet 线性和半监督准确率方面具有竞争力。
在 ImageNet 线性评估中，VICReg 达到 73.2% Top-1（线性）和 91.1% Top-5，半监督结果约为 69.5%（1%）和 89.5%（10%）。
VICReg 在下游任务上达到或超过多种最先进的自监督方法，并在 Places205、VOC07、iNaturalist、COCO 检测/分割等任务上表现出强大的迁移能力。
方差项明确地防止范数崩溃并稳定训练；协方差项去相关嵌入维度，不变性项对齐视图。
VICReg 的模块化损失与非对称分支和多模态数据（如图像-文本）配合良好，且在与其他自监督方法结合时可以提高训练稳定性。
VICReg 不需要权重共享；其分支可以完全独立，支持跨模态或模态无关的自监督学习。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。