Skip to main content
QUICK REVIEW

[論文レビュー] Few-shot Action Recognition with Prototype-centered Attentive Learning

Xiatian Zhu, Antoine Toisoul|arXiv (Cornell University)|Jan 20, 2021
Human Pose and Action Recognition参考文献 38被引用数 36
ひとこと要約

PAL は prototype-centered contrastive loss と Hybrid Attentive Learning 機構を導入し、few-shot action recognition におけるデータ効率と外れ値/クラス間オーバーラップへの耐性を改善し、4 つのベンチマークで最先端の結果を達成する。

ABSTRACT

Few-shot action recognition aims to recognize action classes with few training samples. Most existing methods adopt a meta-learning approach with episodic training. In each episode, the few samples in a meta-training task are split into support and query sets. The former is used to build a classifier, which is then evaluated on the latter using a query-centered loss for model updating. There are however two major limitations: lack of data efficiency due to the query-centered only loss design and inability to deal with the support set outlying samples and inter-class distribution overlapping problems. In this paper, we overcome both limitations by proposing a new Prototype-centered Attentive Learning (PAL) model composed of two novel components. First, a prototype-centered contrastive learning loss is introduced to complement the conventional query-centered learning objective, in order to make full use of the limited training samples in each episode. Second, PAL further integrates a hybrid attentive learning mechanism that can minimize the negative impacts of outliers and promote class separation. Extensive experiments on four standard few-shot action benchmarks show that our method clearly outperforms previous state-of-the-art methods, with the improvement particularly significant (10+\%) on the most challenging fine-grained action recognition benchmark.

研究の動機と目的

  • Address data inefficiency and sensitivity to outliers in few-shot action recognition.
  • Leverage limited episode data by combining prototype-centered contrastive learning with query-/support-set attention.
  • Mitigate inter-class overlap and intra-class outliers through a hybrid attentive learning framework.
  • Demonstrate state-of-the-art performance on four standard few-shot action benchmarks, especially fine-grained datasets.

提案手法

  • Adopt a ProtoNet-based framework augmented with Prototype-centered Attentive Learning (PAL).
  • Introduce a hybrid attentive learning (HAL) module performing support-set self-attention and query-to-support cross-attention.
  • Define a prototype-centered contrastive loss to complement the conventional query-centered objective.
  • Compute per-class prototypes from contextually enriched support features and classify queries via cosine similarity to prototypes.
  • Pretrain the TSN-based feature embedding network on the full training set before episodic meta-training.
  • Train end-to-end in two stages: TSN pretraining, then meta-training with PAL (meta loss + prototype-centered contrastive loss).

実験結果

リサーチクエスチョン

  • RQ1Can prototype-centered contrastive learning improve data utilization in each episode compared to traditional query-centered losses?
  • RQ2Does hybrid attention on support and query samples reduce the impact of outliers and inter-class overlap in few-shot tasks?
  • RQ3How does PAL perform relative to state-of-the-art methods across coarse-grained and fine-grained action benchmarks?
  • RQ4What is the contribution of pretraining the embedding network to overall performance in few-shot action recognition?

主な発見

MethodKinetics-100 1-shotKinetics-100 5-shotSth-Sth-100 1-shotSth-Sth-100 5-shotHMDB51 1-shotHMDB51 5-shotUCF101 1-shotUCF101 5-shot
Matching Net53.374.6------
MAML54.275.3------
ProtoNet++64.577.933.643.0----
TARN64.878.5------
TRN++68.482.038.648.9----
CMN60.578.9------
CMN++65.478.834.443.8----
OTAM73.085.842.852.3----
ARN63.782.4--45.560.666.383.1
FEAT74.086.545.361.260.475.283.994.5
PAL (Ours)74.287.146.462.660.975.885.395.2
  • PAL は four benchmarks で最先端の結果を達成し、特に fine-grained Sth-Sth-100 dataset で 5-shot における改善が顕著(約 10% 改善)。
  • Hybrid Attentive Learning (HAL) と Prototype-centered Contrastive Learning (PCL) は pretrained baseline に対して補完的な利得を提供し、1-shot および 5-shot の精度を向上。
  • embedding ネットワークの pretraining は決定的で、強力な特徴表現を実現し episodic adaptation への依存を低減。
  • PAL は query-centered および prototype-centered の信号を組み合わせることで intra-class variation と inter-class overlap を低減。
  • PAL は OTAM および FEAT を難易度の高いデータセットで一貫して上回り、アウトライヤーおよびクラスオーバーラップへの頑健性を強調。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。