QUICK REVIEW

[論文レビュー] MMAD: Multi-label Micro-Action Detection in Videos

Kun Li, Pengyu Liu|arXiv (Cornell University)|Jul 7, 2024

Human Pose and Action Recognition被引用数 10

ひとこと要約

本論文は、マルチラベル・マイクロアクション検出（MMAD）を導入し、短時間動画で同時に出現する複数のマイクロアクションを識別・時系列的に位置づけ、MMADの訓練・評価用データセット MMA-52 を提示し、ベースライン結果を示す。

ABSTRACT

Human body actions are an important form of non-verbal communication in social interactions. This paper specifically focuses on a subset of body actions known as micro-actions, which are subtle, low-intensity body movements with promising applications in human emotion analysis. In real-world scenarios, human micro-actions often temporally co-occur, with multiple micro-actions overlapping in time, such as concurrent head and hand movements. However, current research primarily focuses on recognizing individual micro-actions while overlooking their co-occurring nature. To address this gap, we propose a new task named Multi-label Micro-Action Detection (MMAD), which involves identifying all micro-actions in a given short video, determining their start and end times, and categorizing them. Accomplishing this requires a model capable of accurately capturing both long-term and short-term action relationships to detect multiple overlapping micro-actions. To facilitate the MMAD task, we introduce a new dataset named Multi-label Micro-Action-52 (MMA-52) and propose a baseline method equipped with a dual-path spatial-temporal adapter to address the challenges of subtle visual change in MMAD. We hope that MMA-52 can stimulate research on micro-action analysis in videos and prompt the development of spatio-temporal modeling in human-centric video understanding. The proposed MMA-52 dataset is available at: https://github.com/VUT-HFUT/Micro-Action.

研究の動機と目的

現実世界の動画で共起するマイクロアクションを検出する必要性を動機づける。
MMAD タスクを定義する：すべてのマイクロアクションを識別し、開始/終了時間とカテゴリを付与する。
MMAD 研究を支える MMA-52 データセットを作成・公開する。
MMA-52 上でベースラインモデルを評価し、ベンチマークを確立し、改善の余地を示す。

提案手法

マイクロアクションの提案に対する開始/終了時間とカテゴリを含む集合予測問題として MMAD を定式化する。
MMA-52 を導入する：52 のマイクロアクションカテゴリ、6,528 本の動画、19,782 件のインスタンス、クロスサブジェクト分割。
MMAD の基準として MS-TCT と PointTAD の2つのベースラインを比較する。
検出- mAP を tIoU 閾値（0.1 から 0.9） Across に評価指標として使用する。

実験結果

リサーチクエスチョン

RQ1短い動画内で時間的に同時発生する複数のマイクロアクションをどのように正確に検出・局在できるか？
RQ2多ラベルのマイクロアクションを研究するために必要なデータセットとベンチマークの品質は何か？
RQ3MMA-52 における最先端のマルチラベルアクション検出器はどのように機能し、改善の余地はどこにあるか？

主な発見

Method	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9	Avg
MS-TCT	6.58	5.72	4.83	4.21	3.91	2.66	2.16	0.99	0.43	3.51
PointTAD	10.69	9.46	7.32	5.21	3.79	2.52	1.02	1.02	0.52	4.51

PointTAD は MMA-52 上での平均検出-mAP（4.51）を MS-TCT（3.51）より高く達成。
両ベースラインはパフォーマンスが限定的であり、MMAD の大幅な進歩の余地を示唆。
MMA-52 は 52 のマイクロアクションカテゴリ、6,528 本の動画、および 19,782 件のインスタンスを提供し、マイクロアクションの詳細な分析を可能にする。
データセットは一般化を促すためにクロスサブジェクト分割を採用している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。