QUICK REVIEW

[논문 리뷰] AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transformations rather than Data

Liheng Zhang, Guo-Jun Qi|arXiv (Cornell University)|2019. 01. 14.

Domain Adaptation and Few-Shot Learning참고 문헌 31인용 수 50

한 줄 요약

본 논문은 Auto-Encoding Transformations (AET)을 소개합니다. 이는 인코딩된 특징으로부터 이미지 변환을 예측하는 비지도 표현 학습 패러다임으로, CIFAR-10, ImageNet, Places에서 감독 방법과 거의 차이가 없는 최첨단 성능에 근접합니다.

ABSTRACT

The success of deep neural networks often relies on a large amount of labeled examples, which can be difficult to obtain in many real scenarios. To address this challenge, unsupervised methods are strongly preferred for training neural networks without using any labeled data. In this paper, we present a novel paradigm of unsupervised representation learning by Auto-Encoding Transformation (AET) in contrast to the conventional Auto-Encoding Data (AED) approach. Given a randomly sampled transformation, AET seeks to predict it merely from the encoded features as accurately as possible at the output end. The idea is the following: as long as the unsupervised features successfully encode the essential information about the visual structures of original and transformed images, the transformation can be well predicted. We will show that this AET paradigm allows us to instantiate a large variety of transformations, from parameterized, to non-parameterized and GAN-induced ones. Our experiments show that AET greatly improves over existing unsupervised approaches, setting new state-of-the-art performances being greatly closer to the upper bounds by their fully supervised counterparts on CIFAR-10, ImageNet and Places datasets.

연구 동기 및 목표

레이블링된 데이터가 부족할 때 비지도 표현 학습의 필요성을 제시한다.
데이터를 재구성하기보다는 입력 변환을 예측함으로써 특징을 학습하기 위해 AET를 제안한다.
AET가 다양한 변환을 지원하고 실험적으로 강한 성과를 낸다는 것을 보인다.

제안 방법

AET를 형식화한다: 인코더 E와 변환 디코더 D를 학습하여 E(x)와 E(t(x))에서 샘플링된 변환 t를 예측한다.
손실 ell(t, t_hat) = loss between true transformation and its estimate, with t_hat = D(E(x), E(t(x))).
매개변수화된 변환(예: 아핀, 투사) 및 GAN 유도 또는 비매개변수화된 변형으로 AET를 구현한다.
원본 이미지와 변형된 이미지를 인코딩하기 위해 가중치를 공유하는 두 가지 가지를 사용하고 피처를 연결하여 변환을 디코딩한다.
E와 D를 업데이트하기 위해 미니배치에서 역전파를 사용한 엔드투엔드 SGD로 학습한다.

실험 결과

연구 질문

RQ1이미지를 변환한 후 학습된 특징으로 변환을 디코딩하는 것이 데이터 재구성보다 더 우수한 비지도 표현을 낳는가?
RQ2어떤 변환 클래스(매개변수화, GAN 유도, 비매개변수화)가 유정보형 특징 학습을 가장 잘 촉진하는가?
RQ3AET가 CIFAR-10, ImageNet, Places에서 최첨단 비지도 방법과 비교했을 때 어떤 성능을 보이는가?
RQ4예측된 변환 손실이 감독 분류 성능과 상관관계가 있는가?

주요 결과

AET-project(프로젝티브 변환)는 CIFAR-10에서 conv 분류기로 7.82% 오차를 달성하여, 7.2%의 완전 감독 성능에 근접합니다.
AET 방법은 CIFAR-10에서 FC 및 conv 분류기 및 KNN 평가에서도 RotNet 및 기타 비지도 기준선보다 우수합니다.
ImageNet에서 AET-project는 여러 비지도 방법을 능가하고 상한치 감독 성능과의 격차를 좁힙니다(예: Conv4 및 Conv5 설정에 대해 보고된 격차 감소).
AET 표현은 변환-예측 손실과 감독 정확도 간의 정렬이 더 잘 나타나고, AET 목표의 효과를 지지합니다.
AET는 ImageNet에서 사전 학습하고 선형/로지스틱 분류기로 평가할 때 Places로의 강력한 전이성을 보입니다.
실험은 다양한 변환을 포함할 수 있음을 시사하며, 매개변수화된 변환은 간단하고 공정한 비교를 제공합니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.