QUICK REVIEW

[논문 리뷰] Learning Representations by Maximizing Mutual Information Across Views

Philip Bachman, R Devon Hjelm|arXiv (Cornell University)|2019. 06. 03.

Machine Learning and Algorithms참고 문헌 45인용 수 678

한 줄 요약

AMDIM은 증강된 뷰와 다중 스케일 뷰 전반에서 상호 정보를 최대화하여 자기지도 이미지 표현을 학습하며, ImageNet에서 68.1% 선형 정확도와 STL10 및 Places205에서 강력한 결과를 달성한다.

ABSTRACT

We propose an approach to self-supervised representation learning based on maximizing mutual information between features extracted from multiple views of a shared context. For example, one could produce multiple views of a local spatio-temporal context by observing it from different locations (e.g., camera positions within a scene), and via different modalities (e.g., tactile, auditory, or visual). Or, an ImageNet image could provide a context from which one produces multiple views by repeatedly applying data augmentation. Maximizing mutual information between features extracted from these views requires capturing information about high-level factors whose influence spans multiple views -- e.g., presence of certain objects or occurrence of certain events. Following our proposed approach, we develop a model which learns image representations that significantly outperform prior methods on the tasks we consider. Most notably, using self-supervised learning, our model learns representations which achieve 68.1% accuracy on ImageNet using standard linear evaluation. This beats prior results by over 12% and concurrent results by 7%. When we extend our model to use mixture-based representations, segmentation behaviour emerges as a natural side-effect. Our code is available online: https://github.com/Philip-Bachman/amdim-public.

연구 동기 및 목표

라벨링 데이터 의존도를 줄이기 위해 비감독 표현 학습을 고무한다.
맥락의 여러 뷰에 걸친 상호 정보를 기반으로 한 자기지도 목표를 개발한다.
확장된 뷰, 다중 스케일 예측, 더 강력한 인코더를 통해 이전의 local DIM을 확장한다.
세분화와 유사한 동작을 낳을 수 있는 혼합 기반 표현을 탐구한다.
표준 시각 벤치마크에서 최첨단 성능을 입증한다.

제안 방법

로컬 Deep InfoMax(DIM)을 Augmented Multiscale DIM(AMDIM)으로 확장한다.
입력의 독립적으로 증강된 복사본들 간의 특징들 사이의 상호 정보를 최대화한다.
다중 특성 스케일에서 예측한다(멀티스케일 infomax).
더 강력한 인코더 아키텍처와 음수 샘플을 포함한 대조학적 NCE 경계치를 사용한다.
같은 맥락의 다양한 뷰를 생성하기 위해 데이터 증강을 도입한다.
엔트로피 정규화 항을 가진 혼합 기반 표현을 도입한다.

실험 결과

연구 질문

RQ1증강된 뷰 간의 상호 정보를 최대화하는 것이 이전의 자기지도 방법들보다 학습된 표현을 향상시키는가?
RQ2다중 스케일과 혼합 기반 특징의 도입이 성능 및 발현되는 동작에 어떤 영향을 주는가?
RQ3데이터 증강 전략과 NCE 정규화가 표현 품질에 미치는 효과는 무엇인가?
RQ4AMDIM은 ImageNet과 같은 대규모 데이터셋으로 확장하고 Places205와 같은 다른 데이터셋으로 이전(전이)될 수 있는가?

주요 결과

AMDIM은 ImageNet에서 선형 평가로 68.1% 정확도를 달성하여 이전 결과보다 12% 포인트 이상 앞선다.
AMDIM은 STL10에서 선형 평가로 94% 이상 정확도에 도달하며 인코더 미세 조정 없이도 달성한다.
Places205에서 AMDIM은 55% 정확도를 달성하여 이전 최고치를 7% 포인트 상회한다.
다중 스케일 및 증강 기반 뷰는 기준 Local DIM에 비해 성능을 크게 향상시킨다.
혼합 기반 표현은 세분화와 유사한 동작을 보이고 STL10 작업에서 잠재적 이득을 보여준다.
이 방법은 CIFAR-10/100, STL10, ImageNet, 및 Places205에서 입증되었으며 경쟁력 있는 결과를 보인다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.