[论文解读] Soft Neighbors are Positive Supporters in Contrastive Visual Representation Learning
SNCLR 在对比自监督学习中通过对候选邻居集合应用基于跨注意力的积极性分数,引入软邻近正样本,从而在分类、检测和分割任务中提升 CNN 和 ViT 编码器的表征。
Contrastive learning methods train visual encoders by comparing views from one instance to others. Typically, the views created from one instance are set as positive, while views from other instances are negative. This binary instance discrimination is studied extensively to improve feature representations in self-supervised learning. In this paper, we rethink the instance discrimination framework and find the binary instance labeling insufficient to measure correlations between different samples. For an intuitive example, given a random image instance, there may exist other images in a mini-batch whose content meanings are the same (i.e., belonging to the same category) or partially related (i.e., belonging to a similar category). How to treat the images that correlate similarly to the current image instance leaves an unexplored problem. We thus propose to support the current image by exploring other correlated instances (i.e., soft neighbors). We first carefully cultivate a candidate neighbor set, which will be further utilized to explore the highly-correlated instances. A cross-attention module is then introduced to predict the correlation score (denoted as positiveness) of other correlated instances with respect to the current one. The positiveness score quantitatively measures the positive support from each correlated instance, and is encoded into the objective for pretext training. To this end, our proposed method benefits in discriminating uncorrelated instances while absorbing correlated instances for SSL. We evaluate our soft neighbor contrastive learning method (SNCLR) on standard visual recognition benchmarks, including image classification, object detection, and instance segmentation. The state-of-the-art recognition performance shows that SNCLR is effective in improving feature representations from both ViT and CNN encoders.
研究动机与目标
- 通过强调不同图像之间的相关性(超越精确实例)来重新思考对比学习中的二元实例判别的动机。
- 开发一种机制,识别并利用软的、高度相关的邻近实例来支持当前样本。
- 在对比损失中整合基于跨注意力的积极性分数,对邻居进行软加权。
- 证明引入软邻居可以提高学习表示对下游任务的转移性。
提出的方法
- 从其他图像构建最近邻的候选邻居集合。
- 计算当前视图与每个候选邻居之间的基于跨注意力的积极性分数 w_i 以获得软权重。
- 将这些权重并入对比损失,作为正贡献的加权和(Eq. 2)。
- 使用记忆队列 C 存储动量分支的特征来进行邻居识别(Eq. 3)。
- 使用标准 SSL 主干网络(ResNet 和 ViT)进行训练,采用合适的优化器(ResNet 使用 LARS,ViT 使用 AdamW)以及数据增强,遵循既有的 SSL 实践。
- 提供可视化和消融研究,展示积极性、邻居数量和候选集合大小对性能的影响。
实验结果
研究问题
- RQ1软的、分级的不同图像之间的相关性是否能在对比学习中优于二元实例判别?
- RQ2在 SSL 中,我们应如何选择并加权邻近实例,以最好地支持给定视图?
- RQ3软邻居的改进是否能在 CNN 和 ViT 架构以及分类、检测、分割等任务间泛化?
主要发现
- SNCLR 在 ImageNet 上对 ResNet-50 的多个自监督基线中一致地提高了 top-1 准确率(例如在多个 epoch 处超越了先前的方法)。
- 基于 ViT 的编码器(ViT-S、ViT-B)也从软邻居中受益,取得的准确率高于若干竞争性自监督方法。
- 在消融实验中,使用 30 个软邻居、较大的候选集合和积极性加权可获得最佳性能,表明邻居数量和软加权两者都很重要。
- SNCLR 在 COCO 上作为预训练信号时,对目标检测和实例分割的迁移性能有所提升,AP 指标高于若干自监督基线。
- 半监督设置中,SNCLR 在 ResNet-50 和 ViT-S 骨干上,利用有限标注数据实现了强的 top-1 和 top-5 表现。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。