QUICK REVIEW

[Paper Review] Are all negatives created equal in contrastive instance discrimination?

Tiffany Cai, Jonathan Frankle|arXiv (Cornell University)|Oct 13, 2020

Categorization, perception, and language31 references57 citations

TL;DR

The paper shows that in contrastive instance discrimination (CID) for MoCo v2 on ImageNet, only the hardest 5% negatives are necessary and sufficient for near-full downstream accuracy, while the easiest 95% are unnecessary; the very hardest 0.1% can be detrimental under certain settings.

ABSTRACT

Self-supervised learning has recently begun to rival supervised learning on computer vision tasks. Many of the recent approaches have been based on contrastive instance discrimination (CID), in which the network is trained to recognize two augmented versions of the same instance (a query and positive) while discriminating against a pool of other instances (negatives). The learned representation is then used on downstream tasks such as image classification. Using methodology from MoCo v2 (Chen et al., 2020), we divided negatives by their difficulty for a given query and studied which difficulty ranges were most important for learning useful representations. We found a minority of negatives -- the hardest 5% -- were both necessary and sufficient for the downstream task to reach nearly full accuracy. Conversely, the easiest 95% of negatives were unnecessary and insufficient. Moreover, the very hardest 0.1% of negatives were unnecessary and sometimes detrimental. Finally, we studied the properties of negatives that affect their hardness, and found that hard negatives were more semantically similar to the query, and that some negatives were more consistently easy or hard than we would expect by chance. Together, our results indicate that negatives vary in importance and that CID may benefit from more intelligent negative treatment.

Motivation & Objective

Motivate understanding of the relative importance of negatives in contrastive instance discrimination (CID).
Quantify how negatives of varying difficulty contribute to downstream ImageNet linear accuracy.
Identify semantic properties that distinguish hard from easy negatives.
Explore whether certain negatives consistently affect learning across queries.
Suggest implications for more intelligent negative sampling in CID.

Proposed method

Use MoCo v2 with a ResNet-50 encoder and an MLP projection head.
Define negative difficulty as the dot product between normalized contrastive-space embeddings of query and negative.
Assess necessity and sufficiency of negatives by removing subsets and measuring downstream accuracy on ImageNet linear classification.
Evaluate across two temperatures (0.07 and 0.20) and three random seeds.
Analyze semantic similarity of negatives via class labels and WordNet-based similarity metrics.

Experimental results

Research questions

RQ1Which negatives (by difficulty) are necessary for high downstream accuracy in CID?
RQ2Are the hardest negatives sufficient when used alone for pretraining?
RQ3Do very hardest negatives harm learning, and if so, why?
RQ4What semantic properties differentiate easy vs hard negatives?
RQ5Can findings inform curriculum or selective negative sampling for CID?

Key findings

The easiest 95% of negatives are unnecessary and insufficient for full accuracy; the top 5% hardest negatives are necessary and sufficient.
Training on only the hardest 5% negatives yields within 0.7 percentage points of baseline top-1 accuracy, while training on the easiest 95% degrades performance.
The very hardest 0.1% negatives are detrimental at lower temperatures and partly beneficial to remove, especially due to same-class negatives.
Hard negatives tend to be more semantically similar to the query than easy negatives; some easy negatives are anti-correlated yet semantically similar to the query.
There exist negatives that are consistently hard or easy across queries, suggesting potential gains from maintaining consistently hard negatives in the queue.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.