QUICK REVIEW

[논문 리뷰] Few-shot Action Recognition with Prototype-centered Attentive Learning

Xiatian Zhu, Antoine Toisoul|arXiv (Cornell University)|2021. 01. 20.

Human Pose and Action Recognition참고 문헌 38인용 수 36

한 줄 요약

PAL은 프로토타입 중심 대조 학습(prototype-centered contrastive loss)과 하이브리드 어텐션 학습 메커니즘을 도입하여 데이터 효율성을 높이고 소수 샘플 동작 인식에서 이상치/클래스 간 중첩을 다루며 네 벤치마크에서 최첨단 결과를 달성합니다.

ABSTRACT

Few-shot action recognition aims to recognize action classes with few training samples. Most existing methods adopt a meta-learning approach with episodic training. In each episode, the few samples in a meta-training task are split into support and query sets. The former is used to build a classifier, which is then evaluated on the latter using a query-centered loss for model updating. There are however two major limitations: lack of data efficiency due to the query-centered only loss design and inability to deal with the support set outlying samples and inter-class distribution overlapping problems. In this paper, we overcome both limitations by proposing a new Prototype-centered Attentive Learning (PAL) model composed of two novel components. First, a prototype-centered contrastive learning loss is introduced to complement the conventional query-centered learning objective, in order to make full use of the limited training samples in each episode. Second, PAL further integrates a hybrid attentive learning mechanism that can minimize the negative impacts of outliers and promote class separation. Extensive experiments on four standard few-shot action benchmarks show that our method clearly outperforms previous state-of-the-art methods, with the improvement particularly significant (10+\%) on the most challenging fine-grained action recognition benchmark.

연구 동기 및 목표

few-shot action recognition에서 데이터 비효율성과 이상치에 대한 민감도 해결.
Prototype-centered contrastive learning과 쿼리/지원 세트 어텐션을 결합하여 한정된 에피소드 데이터를 활용.
하이브리드 어텐티브 러닝 프레임워크를 통해 클래스 간 중첩과 intra-class 이상치를 완화.
특정 미세-제작 데이터셋을 포함한 네 가지 표준 few-shot action 벤치마크에서 최첨단 성능 시연, 특히 미세한 데이터셋에서의 성능 향상

제안 방법

ProtoNet 기반 프레이워크를 Prototype-centered Attentive Learning (PAL)로 보강하여 도입.
하이브리드 어텐티브 러닝(HAL) 모듈을 도입하여 지원 세트 자기 주의(Self-Attention)와 질의-지원 간 교차 주의(Cross-Attention)를 수행.
전통적인 질의 중심 객체(Objective) 보완하는 프로토타입 중심 대조 손실 정의.
맥락적으로 풍부해진 지원 특징들로부터 클래스별 프로토타입을 계산하고 질의를 프로토타입과의 코사인 유사도를 통해 분류.
에피소드 학습 이전에 전체 학습 세트에서 TSN 기반 피처 임베딩 네트워크를 사전 학습.
두 단계로 엔드-투-엔드 학습: TSN 사전 학습, 그다음 PAL으로 메타학습(메타 손실 + 프로토타입 중심 대조 손실) 수행

실험 결과

연구 질문

RQ1프로토타입 중심 대조 학습이 전통적 질의 중심 손실에 비해 각 에피소드에서 데이터 활용을 개선할 수 있는가?
RQ2지원 샘플 및 질의 샘플에 대한 하이브리드 주의가 소수-shot 과제에서 이상치와 클래스 간 중첩의 영향을 줄이는가?
RQ3PAL이 거친 코스-그레이드 및 미세-그레이드 액션 벤치마크에서 최첨단 방법에 비해 어떤 성능을 보이는가?
RQ4임베딩 네트워크의 사전 학습이 few-shot action recognition의 전체 성능에 어떤 기여를 하는가?

주요 결과

방법	Kinetics-100 1샷	Kinetics-100 5샷	Sth-Sth-100 1샷	Sth-Sth-100 5샷	HMDB51 1샷	HMDB51 5샷	UCF101 1샷	UCF101 5샷
Matching Net	53.3	74.6	-	-	-	-	-	-
MAML	54.2	75.3	-	-	-	-	-	-
ProtoNet++	64.5	77.9	33.6	43.0	-	-	-	-
TARN	64.8	78.5	-	-	-	-	-	-
TRN++	68.4	82.0	38.6	48.9	-	-	-	-
CMN	60.5	78.9	-	-	-	-	-	-
CMN++	65.4	78.8	34.4	43.8	-	-	-	-
OTAM	73.0	85.8	42.8	52.3	-	-	-	-
ARN	63.7	82.4	-	-	45.5	60.6	66.3	83.1
FEAT	74.0	86.5	45.3	61.2	60.4	75.2	83.9	94.5
PAL (Ours)	74.2	87.1	46.4	62.6	60.9	75.8	85.3	95.2

PAL은 네 벤치마크에서 최첨단 결과를 달성하며, 미세-세부 데이터셋 Sth-Sth-100에서 5샷 기준으로 약 10%의 개선을 보임.
하이브리드 어텐티브 러닝(HAL)과 프로토타입 중심 대조 학습(PCL)은 사전 학습된 베이스라인에 비해 보완적 이점을 제공하여 1샷 및 5샷 정확도를 향상시킴.
피처 임베딩 네트워크의 사전 학습은 강력한 특징 표현을 가능하게 하고 에피소드 적응 의존도를 줄이는 데 결정적임.
PAL은 질의 중심 신호와 프로토타입 중심 신호를 결합하여 클래스 내 변이 감소 및 클래스 간 중첩을 줄임.
PAL은 도전적인 데이터셋에서 OTAM 및 FEAT보다 일관되게 우수한 성능을 보이며 이상치와 클래스 중첩에 대한 강건성을 강조함.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.