[论文解读] MMAD: Multi-label Micro-Action Detection in Videos
本文提出多标签微行动检测(MMAD),用于在短视频中识别并在时间上定位多种同时发生的微动作,并提出用于训练和评估的 MMA-52 数据集,以及基线结果。
Human body actions are an important form of non-verbal communication in social interactions. This paper specifically focuses on a subset of body actions known as micro-actions, which are subtle, low-intensity body movements with promising applications in human emotion analysis. In real-world scenarios, human micro-actions often temporally co-occur, with multiple micro-actions overlapping in time, such as concurrent head and hand movements. However, current research primarily focuses on recognizing individual micro-actions while overlooking their co-occurring nature. To address this gap, we propose a new task named Multi-label Micro-Action Detection (MMAD), which involves identifying all micro-actions in a given short video, determining their start and end times, and categorizing them. Accomplishing this requires a model capable of accurately capturing both long-term and short-term action relationships to detect multiple overlapping micro-actions. To facilitate the MMAD task, we introduce a new dataset named Multi-label Micro-Action-52 (MMA-52) and propose a baseline method equipped with a dual-path spatial-temporal adapter to address the challenges of subtle visual change in MMAD. We hope that MMA-52 can stimulate research on micro-action analysis in videos and prompt the development of spatio-temporal modeling in human-centric video understanding. The proposed MMA-52 dataset is available at: https://github.com/VUT-HFUT/Micro-Action.
研究动机与目标
- 激发在现实世界视频中检测共同发生的微动作的需求。
- 定义 MMAD 任务:识别所有微动作,并给出起始/结束时间及类别。
- 创建并发布 MMA-52 数据集以支持 MMAD 研究。
- 在 MMA-52 上评估基线模型以建立基准并突出改进空间。
提出的方法
- 将 MMAD 表述为一个关于具有起始/结束时间和类别的微动作提案的集合预测问题。
- 引入 MMA-52:52 个微动作类别,6,528 个视频,19,782 个实例,跨主体划分。
- 比较两种基线方法:MS-TCT 和 PointTAD 在 MMA-52 上的 MMAD 表现。
- 以跨 tIoU 阈值(0.1 到 0.9)的 Detection-mAP 作为评估指标。
实验结果
研究问题
- RQ1如何在短视频中准确检测并定位同时在时间上发生的多个微动作?
- RQ2研究多标签微动作所需的数据集和基准质量是什么?
- RQ3最先进的多标签动作检测器在 MMA-52 上的表现如何?还有哪些改进空间?
主要发现
| 方法 | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | 平均 |
|---|---|---|---|---|---|---|---|---|---|---|
| MS-TCT | 6.58 | 5.72 | 4.83 | 4.21 | 3.91 | 2.66 | 2.16 | 0.99 | 0.43 | 3.51 |
| PointTAD | 10.69 | 9.46 | 7.32 | 5.21 | 3.79 | 2.52 | 1.02 | 1.02 | 0.52 | 4.51 |
- PointTAD 在 MMA-52 上的平均 Detection-mAP(4.51)高于 MS-TCT(3.51) 。
- 两种基线都表现有限,表明 MMAD 仍有较大进步空间。
- MMA-52 提供 52 个微动作类别、6,528 个视频和 19,782 个实例,便于对微动作进行详细分析。
- 该数据集使用跨主体划分以促进对主体的泛化。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。