QUICK REVIEW

[논문 리뷰] Unsupervised Representation Learning by Predicting Image Rotations

Spyros Gidaris, Praveer Singh|arXiv (Cornell University)|2018. 03. 21.

Advanced Image and Video Retrieval Techniques참고 문헌 25인용 수 1,534

한 줄 요약

논문은 ConvNet을 학습시켜 0/90/180/270도 회전을 예측하여 의미적 이미지 특징을 학습하고, CIFAR-10, ImageNet, PASCAL, Places에서 감독되지 않은 전이 및 반지도 학습 결과를 최첨단으로 달성한다.

ABSTRACT

Over the last years, deep convolutional neural networks (ConvNets) have transformed the field of computer vision thanks to their unparalleled capacity to learn high level semantic image features. However, in order to successfully learn those features, they usually require massive amounts of manually labeled data, which is both expensive and impractical to scale. Therefore, unsupervised semantic feature learning, i.e., learning without requiring manual annotation effort, is of crucial importance in order to successfully harvest the vast amount of visual data that are available today. In our work we propose to learn image features by training ConvNets to recognize the 2d rotation that is applied to the image that it gets as input. We demonstrate both qualitatively and quantitatively that this apparently simple task actually provides a very powerful supervisory signal for semantic feature learning. We exhaustively evaluate our method in various unsupervised feature learning benchmarks and we exhibit in all of them state-of-the-art performance. Specifically, our results on those benchmarks demonstrate dramatic improvements w.r.t. prior state-of-the-art approaches in unsupervised representation learning and thus significantly close the gap with supervised feature learning. For instance, in PASCAL VOC 2007 detection task our unsupervised pre-trained AlexNet model achieves the state-of-the-art (among unsupervised methods) mAP of 54.4% that is only 2.4 points lower from the supervised case. We get similarly striking results when we transfer our unsupervised learned features on various other tasks, such as ImageNet classification, PASCAL classification, PASCAL segmentation, and CIFAR-10 classification. The code and models of our paper will be published on: https://github.com/gidariss/FeatureLearningRotNet .

연구 동기 및 목표

수동 라벨링 없이 비지도 방식의 의미적 특징 학습을 제안한다.
자기지도 학습 태스크를 제안한다: 0/90/180/270도 중 회전 각도를 예측한다.
회전 기반 감독이 데이터셋과 과제 간에 전이 가능한 특징을 생성함을 입증한다.
CIFAR-10, ImageNet, PASCAL VOC, Places205에서 감독, 반감독, 전이 설정으로 평가한다.
학습된 특징이 여러 작업에서 감독 학습 수준의 성능에 근접함을 보인다.

제안 방법

집합 K의 이산 기하 변환 G를 정의한다. 이는 0, 90, 180, 270의 배수로 회전한 이미지 회전으로 구성된다.
회전된 이미지 X^y를 주어 회전 레이블 y를 예측하도록 ConvNet F(.)을 학습시켜 4-웨이 분류 태스크를 형성한다.
N개의 이미지에 대해 손실을 최적화한다: loss(X_i,θ) = - (1/K) sum_y log(F^y(g(X_i|y)|θ)).
저수준 아티팩트를 피하기 위해 뒤집기와 전치 연산을 사용하여 회전을 구현한다.
주의 맵과 첫 번째 계층 필터를 시각화하여 회전 예측이 의미 이해를 필요로 함을 주장한다.
RotNet 특징을 CIFAR-10, ImageNet, PASCAL VOC, Places205 과제에 전이시켜 평가한다.
RotNet을 이전의 비지도 방법 및 감독 기초 방법과 비교하고, 반감독 설정을 포함한다.

실험 결과

연구 질문

RQ1이미지 회전을 예측하는 간단한 자기지도 태스크가 의미 있는 특징을 학습할 수 있는가?
RQ2회전 기반 특징은 다양한 데이터셋에서 이미지 분류, 탐지, 세분화 과제로 어떻게 전이되는가?
RQ3모델 깊이와 회전 클래스 수가 특징 품질에 미치는 영향은 무엇인가?
RQ4완전 감독 기초와 비교했을 때 반지도 학습 설정에서 회전 기반 특징의 성능은 어떠한가?
RQ5학습된 특징이 광범위한 전처리나 특수한 아티팩트를 필요로 하는가?

주요 결과

RotNet은 CIFAR-10, ImageNet, PASCAL VOC, Places205에서 비지도 방법 중 최첨단 결과를 달성한다.
CIFAR-10에서 4-블록 네트워크를 가진 RotNet 특징은 비선형 분류기를 사용할 때 최대 89.06% 정확도를 산출하며, 감독 성과(92.80%)에 근접한다.
RotNet 특징은 이미지넷 상위 1% 분류에서 비선형 및 선형 프로브를 사용한 강력한 전이를 제공하며, 이전의 비지도 방법들을 눈에 띄게 상회한다.
반지도 CIFAR-10 실험에서 클래스당 라벨 데이터가 드문 경우(약 1000개 미만)에는 RotNet 기반 특징이 감독 기반에 비해 더 우수하다.
ImageNet에서 학습된 RotNet 특징은 PASCAL VOC 분류/탐지 및 Places 분류에 효과적으로 전이되며, 이전의 비지도 방법들에 비해 큰 이점을 보인다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.