[論文レビュー] Few-shot Action Recognition with Prototype-centered Attentive Learning
PAL は prototype-centered contrastive loss と Hybrid Attentive Learning 機構を導入し、few-shot action recognition におけるデータ効率と外れ値/クラス間オーバーラップへの耐性を改善し、4 つのベンチマークで最先端の結果を達成する。
Few-shot action recognition aims to recognize action classes with few training samples. Most existing methods adopt a meta-learning approach with episodic training. In each episode, the few samples in a meta-training task are split into support and query sets. The former is used to build a classifier, which is then evaluated on the latter using a query-centered loss for model updating. There are however two major limitations: lack of data efficiency due to the query-centered only loss design and inability to deal with the support set outlying samples and inter-class distribution overlapping problems. In this paper, we overcome both limitations by proposing a new Prototype-centered Attentive Learning (PAL) model composed of two novel components. First, a prototype-centered contrastive learning loss is introduced to complement the conventional query-centered learning objective, in order to make full use of the limited training samples in each episode. Second, PAL further integrates a hybrid attentive learning mechanism that can minimize the negative impacts of outliers and promote class separation. Extensive experiments on four standard few-shot action benchmarks show that our method clearly outperforms previous state-of-the-art methods, with the improvement particularly significant (10+\%) on the most challenging fine-grained action recognition benchmark.
研究の動機と目的
- Address data inefficiency and sensitivity to outliers in few-shot action recognition.
- Leverage limited episode data by combining prototype-centered contrastive learning with query-/support-set attention.
- Mitigate inter-class overlap and intra-class outliers through a hybrid attentive learning framework.
- Demonstrate state-of-the-art performance on four standard few-shot action benchmarks, especially fine-grained datasets.
提案手法
- Adopt a ProtoNet-based framework augmented with Prototype-centered Attentive Learning (PAL).
- Introduce a hybrid attentive learning (HAL) module performing support-set self-attention and query-to-support cross-attention.
- Define a prototype-centered contrastive loss to complement the conventional query-centered objective.
- Compute per-class prototypes from contextually enriched support features and classify queries via cosine similarity to prototypes.
- Pretrain the TSN-based feature embedding network on the full training set before episodic meta-training.
- Train end-to-end in two stages: TSN pretraining, then meta-training with PAL (meta loss + prototype-centered contrastive loss).
実験結果
リサーチクエスチョン
- RQ1Can prototype-centered contrastive learning improve data utilization in each episode compared to traditional query-centered losses?
- RQ2Does hybrid attention on support and query samples reduce the impact of outliers and inter-class overlap in few-shot tasks?
- RQ3How does PAL perform relative to state-of-the-art methods across coarse-grained and fine-grained action benchmarks?
- RQ4What is the contribution of pretraining the embedding network to overall performance in few-shot action recognition?
主な発見
| Method | Kinetics-100 1-shot | Kinetics-100 5-shot | Sth-Sth-100 1-shot | Sth-Sth-100 5-shot | HMDB51 1-shot | HMDB51 5-shot | UCF101 1-shot | UCF101 5-shot |
|---|---|---|---|---|---|---|---|---|
| Matching Net | 53.3 | 74.6 | - | - | - | - | - | - |
| MAML | 54.2 | 75.3 | - | - | - | - | - | - |
| ProtoNet++ | 64.5 | 77.9 | 33.6 | 43.0 | - | - | - | - |
| TARN | 64.8 | 78.5 | - | - | - | - | - | - |
| TRN++ | 68.4 | 82.0 | 38.6 | 48.9 | - | - | - | - |
| CMN | 60.5 | 78.9 | - | - | - | - | - | - |
| CMN++ | 65.4 | 78.8 | 34.4 | 43.8 | - | - | - | - |
| OTAM | 73.0 | 85.8 | 42.8 | 52.3 | - | - | - | - |
| ARN | 63.7 | 82.4 | - | - | 45.5 | 60.6 | 66.3 | 83.1 |
| FEAT | 74.0 | 86.5 | 45.3 | 61.2 | 60.4 | 75.2 | 83.9 | 94.5 |
| PAL (Ours) | 74.2 | 87.1 | 46.4 | 62.6 | 60.9 | 75.8 | 85.3 | 95.2 |
- PAL は four benchmarks で最先端の結果を達成し、特に fine-grained Sth-Sth-100 dataset で 5-shot における改善が顕著(約 10% 改善)。
- Hybrid Attentive Learning (HAL) と Prototype-centered Contrastive Learning (PCL) は pretrained baseline に対して補完的な利得を提供し、1-shot および 5-shot の精度を向上。
- embedding ネットワークの pretraining は決定的で、強力な特徴表現を実現し episodic adaptation への依存を低減。
- PAL は query-centered および prototype-centered の信号を組み合わせることで intra-class variation と inter-class overlap を低減。
- PAL は OTAM および FEAT を難易度の高いデータセットで一貫して上回り、アウトライヤーおよびクラスオーバーラップへの頑健性を強調。
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。