QUICK REVIEW

[논문 리뷰] MMAD: Multi-label Micro-Action Detection in Videos

Kun Li, Pengyu Liu|arXiv (Cornell University)|2024. 07. 07.

Human Pose and Action Recognition인용 수 10

한 줄 요약

이 논문은 여러 동시 발생 마이크로 액션을 짧은 비디오에서 식별하고 시간적으로 위치를 파악하기 위한 다중 라벨 마이크로 액션 탐지(MMAD)를 도입하고, 학습 및 평가를 위한 MMA-52 데이터셋을 제시하며, baseline 결과를 제공합니다.

ABSTRACT

Human body actions are an important form of non-verbal communication in social interactions. This paper specifically focuses on a subset of body actions known as micro-actions, which are subtle, low-intensity body movements with promising applications in human emotion analysis. In real-world scenarios, human micro-actions often temporally co-occur, with multiple micro-actions overlapping in time, such as concurrent head and hand movements. However, current research primarily focuses on recognizing individual micro-actions while overlooking their co-occurring nature. To address this gap, we propose a new task named Multi-label Micro-Action Detection (MMAD), which involves identifying all micro-actions in a given short video, determining their start and end times, and categorizing them. Accomplishing this requires a model capable of accurately capturing both long-term and short-term action relationships to detect multiple overlapping micro-actions. To facilitate the MMAD task, we introduce a new dataset named Multi-label Micro-Action-52 (MMA-52) and propose a baseline method equipped with a dual-path spatial-temporal adapter to address the challenges of subtle visual change in MMAD. We hope that MMA-52 can stimulate research on micro-action analysis in videos and prompt the development of spatio-temporal modeling in human-centric video understanding. The proposed MMA-52 dataset is available at: https://github.com/VUT-HFUT/Micro-Action.

연구 동기 및 목표

실제 영상에서 서로 겹쳐 발생하는 마이크로 액션을 탐지해야 할 필요성을 동기화한다.
MMAD 작업 정의: 모든 마이크로 액션을 시작 시간과 종료 시간 및 범주와 함께 식별한다.
MMAD 연구를 지원하기 위해 MMA-52 데이터셋을 만들고 공개한다.
MMA-52에서 베이스라인 모델을 평가하여 벤치마크를 확립하고 개선 여지를 강조한다.

제안 방법

MMAD를 시작/종료 시간과 범주를 가진 마이크로 액션 제안들에 대한 집합 예측 문제로 공식화한다.
MMA-52 도입: 52개의 마이크로 액션 범주, 6,528개 비디오, 19,782개 인스턴스, 교차 피험자 분할.
MMA-52에서 MMAD를 위한 두 베이스라인 MS-TCT와 PointTAD를 비교한다.
평가 지표로 다중 IoU 임계값(0.1에서 0.9까지)의 Detection-mAP를 사용한다.

실험 결과

연구 질문

RQ1시간상으로 동시 발생하는 여러 마이크로 액션을 짧은 비디오에서 어떻게 정확하게 탐지하고 위치를 지정할 수 있는가?
RQ2다중 라벨 마이크로 액션을 연구하기 위해 필요한 데이터셋 및 벤치마크 품질은 무엇인가?
RQ3최신 다중 라벨 액션 탐지기가 MMA-52에서 어떤 성능을 보이며, 개선 여지는 어디에 있는가?

주요 결과

방법	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9	평균
MS-TCT	6.58	5.72	4.83	4.21	3.91	2.66	2.16	0.99	0.43	3.51
PointTAD	10.69	9.46	7.32	5.21	3.79	2.52	1.02	1.02	0.52	4.51

PointTAD가 MMA-52에서 평균 Detection-mAP(4.51)를 MS-TCT(3.51)보다 더 높게 달성했다.
두 베이스라인 모두 성능이 제한적이며, MMAD 분야의 상당한 발전 여지가 있음을 시사한다.
MMA-52는 52개의 마이크로 액션 범주, 6,528개의 비디오, 19,782개의 인스턴스를 제공하여 마이크로 액션에 대한 상세한 분석이 가능하다.
데이터셋은 피험자 간 일반화를 촉진하기 위해 교차 피험자 분할을 사용한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.