[论文解读] Few-shot Action Recognition with Prototype-centered Attentive Learning
PAL 引入以原型为中心的对比学习损失和混合注意力学习机制,以提高数据利用效率并在少样本动作识别中处理异常值/类间重叠,在四个基准数据集上达到最新研究水平。
Few-shot action recognition aims to recognize action classes with few training samples. Most existing methods adopt a meta-learning approach with episodic training. In each episode, the few samples in a meta-training task are split into support and query sets. The former is used to build a classifier, which is then evaluated on the latter using a query-centered loss for model updating. There are however two major limitations: lack of data efficiency due to the query-centered only loss design and inability to deal with the support set outlying samples and inter-class distribution overlapping problems. In this paper, we overcome both limitations by proposing a new Prototype-centered Attentive Learning (PAL) model composed of two novel components. First, a prototype-centered contrastive learning loss is introduced to complement the conventional query-centered learning objective, in order to make full use of the limited training samples in each episode. Second, PAL further integrates a hybrid attentive learning mechanism that can minimize the negative impacts of outliers and promote class separation. Extensive experiments on four standard few-shot action benchmarks show that our method clearly outperforms previous state-of-the-art methods, with the improvement particularly significant (10+\%) on the most challenging fine-grained action recognition benchmark.
研究动机与目标
- 解决少样本动作识别中的数据低效与对异常值的敏感性。
- 通过将以原型为中心的对比学习与查询集/支持集注意力相结合,充分利用有限的episode数据。
- 通过混合注意力学习框架缓解类间重叠和类内异常值。
- 在四个标准少样本动作基准上展示最先进的性能,特别是细粒度数据集。
提出的方法
- 采用基于 ProtoNet 的框架,并配以 Prototype-centered Attentive Learning (PAL)。
- 引入一个混合注意力学习(HAL)模块,执行支持集自注意力和查询到支持的跨注意力。
- 定义以原型为中心的对比损失以补充传统的查询中心目标。
- 从上下文增强的支持特征中计算每个类别的原型,并通过与原型的余弦相似度对查询进行分类。
- 在完整训练集上对基于TSN的特征嵌入网络进行预训练,然后进行 episodic 元训练。
- 端到端两阶段训练:TSN 预训练,然后与 PAL 一起进行元训练(元损失 + 原型中心对比损失)。
实验结果
研究问题
- RQ1相对于传统的查询中心损失,原型中心对比学习是否能在每个 episode 中提升数据利用率?
- RQ2对支持和查询样本的混合注意力是否降低少样本任务中异常值和类间重叠的影响?
- RQ3PAL 相对于现有最先进方法在粗粒度和细粒度动作基准上的表现如何?
- RQ4对少样本动作识别的整体性能,嵌入网络的预训练贡献是多少?
主要发现
| Method | Kinetics-100 1-shot | Kinetics-100 5-shot | Sth-Sth-100 1-shot | Sth-Sth-100 5-shot | HMDB51 1-shot | HMDB51 5-shot | UCF101 1-shot | UCF101 5-shot |
|---|---|---|---|---|---|---|---|---|
| Matching Net | 53.3 | 74.6 | - | - | - | - | - | - |
| MAML | 54.2 | 75.3 | - | - | - | - | - | - |
| ProtoNet++ | 64.5 | 77.9 | 33.6 | 43.0 | - | - | - | - |
| TARN | 64.8 | 78.5 | - | - | - | - | - | - |
| TRN++ | 68.4 | 82.0 | 38.6 | 48.9 | - | - | - | - |
| CMN | 60.5 | 78.9 | - | - | - | - | - | - |
| CMN++ | 65.4 | 78.8 | 34.4 | 43.8 | - | - | - | - |
| OTAM | 73.0 | 85.8 | 42.8 | 52.3 | - | - | - | - |
| ARN | 63.7 | 82.4 | - | - | 45.5 | 60.6 | 66.3 | 83.1 |
| FEAT | 74.0 | 86.5 | 45.3 | 61.2 | 60.4 | 75.2 | 83.9 | 94.5 |
| PAL (Ours) | 74.2 | 87.1 | 46.4 | 62.6 | 60.9 | 75.8 | 85.3 | 95.2 |
- PAL 在四个基准上实现最先进的结果,在细粒度数据集 Sth-Sth-100 上尤有显著提升(5-shot 大约提高约10%)。
- 混合注意力学习(HAL)和原型中心对比学习(PCL)在预训练基线之上提供互补提升,提升1-shot和5-shot精度。
- 对嵌入网络的预训练至关重要,能够获得强特征表示并减少对 episodic 适应的依赖。
- 通过结合查询中心信号与原型中心信号,PAL 减少了类内方差和类间重叠。
- 在具有挑战性的数据集上,PAL 持续优于 OTAM 和 FEAT,突出对异常值和类别重叠的鲁棒性。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。