QUICK REVIEW

[논문 리뷰] Barlow Twins: Self-Supervised Learning via Redundancy Reduction

Jure Žbontar, Jing Li|arXiv (Cornell University)|2021. 03. 04.

Domain Adaptation and Few-Shot Learning참고 문헌 48인용 수 777

한 줄 요약

Barlow Twins는 상호 상관을 강제로 증가시키지 않으면서 twin 임베딩의 구성 요소를 상관 관계 제거하여 불변성을 보장하는 자기지도 학습 목표를 도입한다. 대형 배치 또는 비대칭 네트워크 설계 없이도 효과적인 표현을 가능하게 하며, ImageNet 벤치마크에서 특히 고차원 임베딩일 때 최첨단에 부합하거나 이를 능가한다.

ABSTRACT

Self-supervised learning (SSL) is rapidly closing the gap with supervised methods on large computer vision benchmarks. A successful approach to SSL is to learn embeddings which are invariant to distortions of the input sample. However, a recurring issue with this approach is the existence of trivial constant solutions. Most current methods avoid such solutions by careful implementation details. We propose an objective function that naturally avoids collapse by measuring the cross-correlation matrix between the outputs of two identical networks fed with distorted versions of a sample, and making it as close to the identity matrix as possible. This causes the embedding vectors of distorted versions of a sample to be similar, while minimizing the redundancy between the components of these vectors. The method is called Barlow Twins, owing to neuroscientist H. Barlow's redundancy-reduction principle applied to a pair of identical networks. Barlow Twins does not require large batches nor asymmetry between the network twins such as a predictor network, gradient stopping, or a moving average on the weight updates. Intriguingly it benefits from very high-dimensional output vectors. Barlow Twins outperforms previous methods on ImageNet for semi-supervised classification in the low-data regime, and is on par with current state of the art for ImageNet classification with a linear classifier head, and for transfer tasks of classification and object detection.

연구 동기 및 목표

주석 없이 시각 표현을 위한 자기지도 학습(SSL) 도입을 동기화한다.
비대칭을 요구하지 않으면서 무너지는 해를 피하는 원칙적 목표를 제시한다.
임베딩 구성 요소를 상관 제거를 통해 분리하되 불변성을 유지한다.
배치 크기에 대한 강인성 확보 및 고차원 임베딩의 이점을 탐구한다.

제안 방법

각 이미지의 두 개의 왜곡된 뷰를 계산하고 동일한 네트워크를 통해 전달한다.
쌍으로 된 출력들 사이에 교차상관 행렬을 형성하고 이를 항등 행렬에 가깝게 만들도록 타깃으로 삼는다.
손실을 불변성 항(대각선)과 중복 제거 항(비대각선)으로 나누고 트레이드-오프 매개변수 lambda를 적용한다.
ResNet-50 인코더와 3계층 8192-d 프로젝터를 사용하고 배치 차원에서 임베딩을 정규화한다; 대규모 ImageNet 사전학습에서 LARS로 최적화한다.
메소드가 배치 크기 256만으로도 작동하고 고차원 임베딩의 이점을 얻는다는 점을 보인다.

실험 결과

연구 질문

RQ1비대칭이나 대규모 배치 요구 없이 간단하고 대칭적인 쌍 네트워크 목표가 붕괴를 피할 수 있는가?
RQ2중복 제거가 임베딩의 상관 제거 및 다운스트림 전이 성능에 어떤 영향을 미치는가?
RQ3임베딩 차원 수와 프로젝터 깊이가 SSL 품질에 미치는 영향은 무엇인가?
RQ4InfoNCE 기반 접근법과 비교했을 때 배치 크기 및 증강 선택에 대해 방법이 강건한가?

주요 결과

방법	Top-1	Top-5
Supervised	76.5
MoCo	60.6
PIRL	63.6	-
SimCLR	69.3	89.0
MoCo v2	71.1	90.1
SimSiam	71.3	-
SwAV (w/o multi-crop)	71.8	-
BYOL	74.3	91.6
SwAV	75.3	-
Barlow Twins (ours)	73.2	91.0

Barlow Twins는 ResNet-50 인코더로 ImageNet 선형-Top-1 정확도 73.2%를 달성한다.
라벨이 1% 미만 또는 10%인 설정에서 이 방법은 준지도 형식의 ImageNet에서 경쟁 SSL 메서드와 일치하거나 다소 능가한다.
전이 결과는 Places-205, VOC07, iNaturalist18에서 선형 고정 표현으로 경쟁력 있는 성능을 보인다.
객체 탐지 및 인스턴스 분할에서 Barlow Twins는 여러 SOTA 방법과 비교해 동등하거나 더 나은 성능을 보인다.
변형에 따른 불변성 및 중복 제거 항이 모두 필요함을 보여주는 제거 연구가 있으며, 작은 배치 크기에 대해 강건하고 고차원 임베딩에서 이점이 있다.
프로젝터 차원 수를 늘리면 성능이 향상되는 경향이 계속 나타나며, 다른 SSL 방법과 달리 같은 경향은 보인다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.