QUICK REVIEW

[논문 리뷰] Unsupervised Meta-Learning For Few-Shot Image and Video Classification.

Siavash Khodadadeh, Ladislau Bölöni|arXiv (Cornell University)|2018. 11. 28.

Domain Adaptation and Few-Shot Learning인용 수 13

한 줄 요약

이 논문은 레이블이 없는 데이터에서 합성된 작업을 생성함으로써 레이블이 필요한 메타학습 작업 없이 소수의 이미지 및 비디오 분류를 가능하게 하는 비지도 메타학습 프레임워크 UMTRA를 제안한다. Omniglot 다섯 방향 일회 학습 분류에서 MAML의 정확도의 85%를 달성하면서도 레이블 데이터 요구량을 24,005개에서 5개로 줄였다.

ABSTRACT

Few-shot or one-shot learning of classifiers for images or videos is an important next frontier in computer vision. The extreme paucity of training data means that the learning must start with a significant inductive bias towards the type of task to be learned. One way to acquire this is by meta-learning on tasks similar to the target task. However, if the meta-learning phase requires labeled data for a large number of tasks closely related to the target task, it not only increases the difficulty and cost, but also conceptually limits the approach to variations of well-understood domains. In this paper, we propose UMTRA, an algorithm that performs meta-learning on an unlabeled dataset in an unsupervised fashion, without putting any constraint on the classifier network architecture. The only requirements towards the dataset are: sufficient size, diversity and number of classes, and relevance of the domain to the one in the target task. Exploiting this information, UMTRA generates synthetic training tasks for the meta-learning phase. We evaluate UMTRA on few-shot and one-shot learning on both image and video domains. To the best of our knowledge, we are the first to evaluate meta-learning approaches on UCF-101. On the Omniglot and Mini-Imagenet few-shot learning benchmarks, UMTRA outperforms every tested approach based on unsupervised learning of representations, while alternating for the best performance with the recent CACTUs algorithm. Compared to supervised model-agnostic meta-learning approaches, UMTRA trades off some classification accuracy for a vast decrease in the number of labeled data needed. For instance, on the five-way one-shot classification on the Omniglot, we retain 85% of the accuracy of MAML, a recently proposed supervised meta-learning algorithm, while reducing the number of required labels from 24005 to 5.

연구 동기 및 목표

최소한의 레이블 데이터로 소수의 이미지 및 비디오 분류 문제를 해결하기 위한 도전 과제를 해결하기 위해.
다양하고 레이블이 없는 데이터셋에서의 비지도 메타학습을 통해 레이블이 필요한 메타학습 작업에 대한 의존도를 제거하기 위해.
모든 분류기 아키텍처와 호환되는 모델에 종속되지 않는 메타학습 접근법을 개발하기 위해.
UCF-101 비디오 벤치마크에서 메타학습 성능을 평가하기 위해. 이는 이 맥락에서 새로운 시도이다.
감소된 레이블 데이터 요구량으로도 감독 메타학습 방법과 경쟁 가능한 성능를 달성하기 위해.

제안 방법

UMTRA는 목표 작업의 도메인과 관련성이 높고 충분히 크고 다양한 레이블이 없는 데이터셋에서 합성된 학습 작업을 생성한다.
클러스터링과 데이터 증강 기법을 활용하여 소수의 학습 시나리오를 모방하는 지원(set) 및 쿼리(set) 집합을 생성한다.
메타학습 단계에서 레이블이 전혀 필요 없이 이러한 합성 작업에서 메타학습을 수행한다.
모든 분류기 네트워크 아키텍처와 호환되어 모델에 종속되지 않는다.
합성 작업 내에서 지원 샘플과 쿼리 샘플 간의 특징 구분을 장려하기 위해 대비 학습 목적함수를 사용한다.
프레임워크는 이미지 분류에 대해 Omniglot과 Mini-Imagenet, 비디오 분류에 대해 UCF-101에서 평가된다.

실험 결과

연구 질문

RQ1레이블이 없는 데이터에서의 비지도 메타학습이 경쟁 가능한 소수의 분류 성능를 달성할 수 있는가?
RQ2UMTRA는 MAML과 같은 감독 메타학습 방법과 비교해 정확도와 레이블 데이터 효율성 측면에서 어떻게 성과를 내는가?
RQ3UMTRA는 UCF-101과 같은 비디오 분류 작업으로 일반화될 수 있는가?
RQ4데이터셋의 다양성과 도메인 관련성이 UMTRA의 성능에 어떤 영향을 미치는가?
RQ5UMTRA의 성능는 다른 비지도 표현 학습 기반 메타학습 접근법과 비교해 어떻게 되는가?

주요 결과

Omniglot 다섯 방향 일회 학습 분류 벤치마크에서, UMTRA는 감독 메타학습 방법인 MAML의 정확도의 85%를 달성한다.
동일한 작업에서 요구되는 레이블 예제 수를 24,005개에서 단 5개로 줄여 99.98% 감소시켰다.
Omniglot과 Mini-Imagenet에서, UMTRA는 테스트된 모든 비지도 표현 학습 기반 메타학습 접근법을 능가한다.
UMTRA는 비지도 메타학습 방법 중 최고 성능를 기록하며, CACTUs와 번갈아가며 최고의 성능를 기록한다.
이 연구는 UCF-101 비디오 벤치마크에서 메타학습을 평가한 최초의 사례이며, 비디오 분류에의 적용 가능성을 입증한다.
UMTRA는 이미지 및 비디오 도메인 모두에서 강력한 성능를 유지하며, 레이블이 없는 데이터가 관련성이 있을 경우 도메인 이동에 대해 강건함을 보였다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.