QUICK REVIEW

[논문 리뷰] SkeleMotion: A New Representation of Skeleton Joint Sequences Based on Motion Information for 3D Action Recognition

Carlos Caetano, Jessica Sena|arXiv (Cornell University)|2019. 07. 30.

Human Pose and Action Recognition참고 문헌 42인용 수 212

한 줄 요약

SkeleMotion은 여러 시점 스케일에 걸친 골격 관절의 시간 역학을 모션 크기와 방향으로 인코딩하여 아주 작은 CNN의 입력으로 사용하고, 공간 골격 표현과 융합될 때 NTU RGB+D 120에서 최첨단 결과를 달성한다.

ABSTRACT

Due to the availability of large-scale skeleton datasets, 3D human action recognition has recently called the attention of computer vision community. Many works have focused on encoding skeleton data as skeleton image representations based on spatial structure of the skeleton joints, in which the temporal dynamics of the sequence is encoded as variations in columns and the spatial structure of each frame is represented as rows of a matrix. To further improve such representations, we introduce a novel skeleton image representation to be used as input of Convolutional Neural Networks (CNNs), named SkeleMotion. The proposed approach encodes the temporal dynamics by explicitly computing the magnitude and orientation values of the skeleton joints. Different temporal scales are employed to compute motion values to aggregate more temporal dynamics to the representation making it able to capture longrange joint interactions involved in actions as well as filtering noisy motion values. Experimental results demonstrate the effectiveness of the proposed representation on 3D action recognition outperforming the state-of-the-art on NTU RGB+D 120 dataset.

연구 동기 및 목표

명시적으로 관절 모션 정보를 모델링하여 골격 기반 3D 행동 인식을 촉진하고 개선한다.
관절 모션의 크기와 방향을 인코딩하는 새로운 골격 이미지 표현(SkeleMotion)을 제안한다.
다중 스케일 시계열 통합을 활용하여 장기 상호작용을 포착하고 노이즈를 감소시킨다.
작고 간단한 표현에서 빠르게 학습할 수 있는 가벼운 CNN 분류기를 제공한다.
SkeleMotion 사용 시 NTU RGB+D 60/120에서 최첨단 또는 경쟁력 있는 결과를 보여주며 공간 표현과의 융합 포함한다.

제안 방법

공간 관계를 보존하기 위해 깊이 우선 골격 순회를 통해 미리 정의된 관절 체인 C를 구성한다.
프레임별 관절 좌표 S를 계산하고 느린 프레임 차이로 모션 구조 D를 도출한다(지연 d). D = S_{c,t+d} - S_c.
D로부터 크기 M과 방향 θ를 도출하고, θ는 xy, yz, zx 성분에서 계산되며 노이즈를 억제하기 위해 크기 임계값 m으로 필터링된다.
결과 M 및 θ 표현을 정규화하고 크기 조정하여 SkeleMotion 이미지를 형성한다(C x T x 채널들).
작은 CNN(3개의 컨볼루션 레이어, 2개의 완전 연결층)을 사용하고 행동 분류를 위해 처음부터 학습한다.
다중 시점 지연 d에 대해 D, M, θ를 구하고 결과를 스택하여 시간적 다이나믹스를 풍부하게 하는 Temporal Scale Aggregation (TSA)를 도입한다.

실험 결과

연구 질문

RQ1다중 시점 스케일에서의 명시적 모션 정보(크기와 방향)가 기존의 골격 이미지 표현보다 골격 기반 행동 인식을 향상시킬 수 있는가?
RQ2다중 스케일 시계열 통합이 장기 상호작용을 포착하고 노이즈가 많은 모션 신호를 줄이는 데 도움이 되는가?
RQ3NTU RGB+D 60과 120 데이터 세트에서 공간 표현과 융합하는 경우를 포함하여 SkeleMotion이 최첨단의 골격-이미지 기반 방법에 비해 어떤 성능을 보이는가?

주요 결과

Magnitude (TSA)와 함께 SkeleMotion은 NTU RGB+D 60의 교차 시야 정확도에서 여러 베이스라인을 능가하며 강력한 성능을 보인다.
Magnitude (TSA)와 함께 NTU RGB+D 60에서 69.6%의 교차 주체(Cross-Subject) 및 80.1%의 교차 뷰(Cross-View) 정확도를 달성한다.
Orientation (TSA)만으로도 경쟁력 있는 결과를 내지만 Magnitude (TSA)가 일반적으로 더 잘 수행하며 Magnitude+Orientation (TSA)을 결합하면 정확도가 더 향상된다.
Yang et al. (TSSI) 방식과의 융합은 더 나은 결과를 제공하여 NTU RGB+D 60에서 초기 융합 및 후기 융합 설정 모두에서 여러 베이스라인을 능가한다.
NTU RGB+D 120에서 Magnitude+Orientation (TSA) 기반 결과는 최첨단 LSTM 기반 접근법과 경쟁적이며, Yang et al.과 융합 시 최첨단과 같은 성능을 달성하여 여러 선행 골격 기반 방법을 능가한다.
본 연구는 명시적 모션 모델링과 TSA가 모션을 무시한 골격 표현 및 기초 모션 인코딩 대비 현저한 이점을 제공함을 보여준다.
SkeleMotion 코드는 재현성을 위해 https://github.com/carloscaetano/skeleton-images에서 공개적으로 이용 가능하다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.