QUICK REVIEW

[Paper Review] Rethinking Image Mixture for Unsupervised Visual Representation Learning

Zhiqiang Shen, Zechun Liu|arXiv (Cornell University)|Mar 11, 2020

Advanced Image and Video Retrieval Techniques59 references25 citations

TL;DR

This paper proposes Un-Mix, a simple yet effective unsupervised data augmentation technique that applies image mixtures to soften prediction distributions during self-supervised representation learning. By perturbing input images via mixup-style interpolation and assigning new pseudo-labels, Un-Mix improves robustness and generalization across multiple benchmarks, achieving consistent 1–3% accuracy gains over base methods like SimCLR, BYOL, and MoCo without changing hyperparameters or training procedures.

ABSTRACT

In supervised learning, smoothing label or prediction distribution in neural network training has been proven useful in preventing the model from being over-confident, and is crucial for learning more robust visual representations. This observation motivates us to explore ways to make predictions flattened in unsupervised learning. Considering that human-annotated labels are not adopted in unsupervised learning, we introduce a straightforward approach to perturb input image space in order to soften the output prediction space indirectly, meanwhile, assigning new label values in the unsupervised frameworks accordingly. Despite its conceptual simplicity, we show empirically that with the simple solution -- Unsupervised image mixtures (Un-Mix), we can learn more robust visual representations from the transformed input. Extensive experiments are conducted on CIFAR-10, CIFAR-100, STL-10, Tiny ImageNet and standard ImageNet with popular unsupervised methods SimCLR, BYOL, MoCo V1&V2, etc. Our proposed image mixture and label assignment strategy can obtain consistent improvement by 1~3% following exactly the same hyperparameters and training procedures of the base methods.

Motivation & Objective

To address the lack of label smoothing in unsupervised visual representation learning, where models can become overconfident.
To explore indirect ways of softening prediction distributions without relying on human-annotated labels.
To develop a plug-and-play augmentation strategy that enhances robustness in self-supervised learning frameworks.
To evaluate the effectiveness of input-space perturbations via image mixing on standard benchmarks using popular unsupervised methods.
To demonstrate consistent performance gains across diverse datasets and architectures with minimal modification to existing training pipelines.

Proposed method

Proposes Un-Mix, a method that applies mixup-style interpolation between pairs of input images to create augmented training samples.
Assigns pseudo-labels to mixed images based on the original labels of the constituent images, using a weighted average to form soft labels.
Applies the image mixture and label assignment directly in the input space, avoiding the need for model-level label smoothing or architectural changes.
Integrates seamlessly into existing unsupervised learning frameworks such as SimCLR, BYOL, MoCo V1, and MoCo V2.
Uses standard training procedures and hyperparameters, ensuring compatibility and ease of adoption.
Employs a symmetric mixup strategy to maintain consistency in contrastive and momentum-based training objectives.

Experimental results

Research questions

RQ1Can input-space image mixing with pseudo-label assignment improve the robustness of unsupervised visual representations?
RQ2Does softening prediction distributions through data augmentation lead to better generalization in self-supervised learning?
RQ3Can a simple, plug-and-play method like Un-Mix achieve consistent gains across diverse datasets and unsupervised learning methods?
RQ4Is the performance improvement from Un-Mix dependent on hyperparameter tuning or model architecture?
RQ5How does Un-Mix compare to other data augmentation strategies in terms of accuracy and stability?

Key findings

Un-Mix achieves consistent performance improvements of 1–3% across multiple unsupervised learning benchmarks, including CIFAR-10, CIFAR-100, STL-10, Tiny ImageNet, and ImageNet.
The gains are observed without modifying any hyperparameters or training procedures, demonstrating the method's compatibility and plug-and-play nature.
The improvement is stable across different self-supervised methods, including SimCLR, BYOL, MoCo V1, and MoCo V2.
The method effectively softens prediction distributions by perturbing the input space, reducing model overconfidence.
Empirical results show that Un-Mix enhances representation quality, leading to better downstream accuracy in linear evaluation protocols.
The approach is computationally efficient and does not require additional model parameters or complex training schedules.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.