QUICK REVIEW

[论文解读] Understanding the Behaviour of Contrastive Loss

Feng Wang, Huaping Liu|arXiv (Cornell University)|Dec 15, 2020

Domain Adaptation and Few-Shot Learning参考文献 37被引用 24

一句话总结

本文研究了对比损失在自监督表示学习中的行为，揭示其为一种对难度敏感的损失函数，通过温度控制优先关注困难负样本。研究识别出统一性-容忍度之间的两难困境：尽管高统一性能提升特征可分性，但对语义相似样本施加过重惩罚会损害下游性能。关键贡献在于表明，通过温度调节实现统一性与容忍度的平衡可获得最佳性能，其中在CIFAR和ImageNet基准上，温度τ=0.2–0.3时表现最优。

ABSTRACT

Unsupervised contrastive learning has achieved outstanding success, while the mechanism of contrastive loss has been less studied. In this paper, we concentrate on the understanding of the behaviours of unsupervised contrastive loss. We will show that the contrastive loss is a hardness-aware loss function, and the temperature τ controls the strength of penalties on hard negative samples. The previous study has shown that uniformity is a key property of contrastive learning. We build relations between the uniformity and the temperature τ . We will show that uniformity helps the contrastive learning to learn separable features, however excessive pursuit to the uniformity makes the contrastive loss not tolerant to semantically similar samples, which may break the underlying semantic structure and be harmful to the formation of features useful for downstream tasks. This is caused by the inherent defect of the instance discrimination objective. Specifically, instance discrimination objective tries to push all different instances apart, ignoring the underlying relations between samples. Pushing semantically consistent samples apart has no positive effect for acquiring a prior informative to general downstream tasks. A well-designed contrastive loss should have some extents of tolerance to the closeness of semantically similar samples. Therefore, we find that the contrastive loss meets a uniformity-tolerance dilemma, and a good choice of temperature can compromise these two properties properly to both learn separable features and tolerant to semantically similar samples, improving the feature qualities and the downstream performances.

研究动机与目标

理解对比损失在无监督表示学习中的行为机制。
分析温度τ在控制难度敏感性与嵌入分布特性方面的作用。
识别特征分布统一性与对语义相似样本容忍度之间的权衡。
证明实例判别目标本质上会通过将相似样本彼此推开而破坏语义结构。

提出的方法

将对比损失分析为一种难度敏感函数，其中温度τ调节对困难负样本的惩罚强度。
将温度τ作为代理变量，研究其对嵌入统一性及对语义相似样本容忍度的影响。
分别通过公式10和公式11测量统一性与容忍度，在CIFAR10、CIFAR100、SVHN和ImageNet100上进行实验。
使用标准对比损失（公式1）和困难对比损失（公式9）训练模型，比较不同τ设置下的性能表现。
以线性分类准确率为代理指标，评估多种数据集上的下游任务性能。
将无温度缩放的简单对比损失（公式3）与带显式困难负样本采样的版本进行比较，以隔离难度敏感性的重要性。

实验结果

研究问题

RQ1温度τ如何影响对比损失在学习可分特征时的难度敏感性？
RQ2在对比学习中，嵌入分布的统一性与对语义相似样本的容忍度之间存在何种权衡？
RQ3为何过度追求统一性会损害下游性能，尽管其能提升特征可分性？
RQ4若结合显式困难负样本采样，无温度缩放的简单对比损失能否实现有竞争力的性能？
RQ5实例判别目标为何在对比学习中无法有效保留潜在的语义结构？

主要发现

在CIFAR10、CIFAR100、SVHN和ImageNet100上，使用τ=0.2或0.3训练的模型实现了最高的线性分类准确率，表明统一性与容忍度之间达到了最佳平衡。
较小的温度（如τ=0.07）导致分布高度统一，但对语义相似样本施加了过度惩罚，损害了特征质量。
较大的温度（如τ=0.2）提高了对相似样本的容忍度，但降低了统一性，导致特征可分性下降。
标准对比损失（公式1）在τ=0.2时于CIFAR10上达到83.27%的线性准确率，优于无难度敏感性的简单损失（74.83%）。
显式困难负样本采样使简单对比损失（公式3）在SVHN上达到95.47%的优异性能，证明难度敏感性是成功的核心因素。
困难对比损失（公式9）缓解了统一性-容忍度两难困境，通过显式挖掘保持了统一性，因此在较大τ值下仍能实现更优性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。