QUICK REVIEW

[논문 리뷰] Two-Stream 3D Convolutional Neural Network for Skeleton-Based Action Recognition

Hong Liu, Juanhui Tu|arXiv (Cornell University)|2017. 05. 23.

Human Pose and Action Recognition참고 문헌 24인용 수 111

한 줄 요약

뼈대 기반 동작 인식을 위한 이중 흐름(two-stream) 3D CNN 아키텍처를 제시하고, 공간적 스트림과 시간적 스트림을 분리하고 다중 시간 확장을 적용하면 NTU RGB-D 및 SmartHome 데이터 세트에서 다수의 RNN 기반 방법보다 성능이 우수함을 보임.

ABSTRACT

It remains a challenge to efficiently extract spatialtemporal information from skeleton sequences for 3D human action recognition. Although most recent action recognition methods are based on Recurrent Neural Networks which present outstanding performance, one of the shortcomings of these methods is the tendency to overemphasize the temporal information. Since 3D convolutional neural network(3D CNN) is a powerful tool to simultaneously learn features from both spatial and temporal dimensions through capturing the correlations between three dimensional signals, this paper proposes a novel two-stream model using 3D CNN. To our best knowledge, this is the first application of 3D CNN in skeleton-based action recognition. Our method consists of three stages. First, skeleton joints are mapped into a 3D coordinate space and then encoding the spatial and temporal information, respectively. Second, 3D CNN models are seperately adopted to extract deep features from two streams. Third, to enhance the ability of deep features to capture global relationships, we extend every stream into multitemporal version. Extensive experiments on the SmartHome dataset and the large-scale NTU RGB-D dataset demonstrate that our method outperforms most of RNN-based methods, which verify the complementary property between spatial and temporal information and the robustness to noise.

연구 동기 및 목표

3D 동작 인식을 위한 뼈대 시퀀스로부터 공간-시간 정보를 효율적으로 추출하려는 동기를 제시한다.
뼈대 데이터에 적용된 신규 이중 스트림 3D CNN 프레임워크를 제안한다.
각 스트림을 다중 시간 버전으로 확장하여 심층 특징 표현을 강화한다.
노이즈에 대한 강건성과 공간 정보와 시간 정보의 보완적 이점을 시연한다.
대규모 데이터셋(NTU RGB-D)과 SmartHome 데이터셋에서 접근법을 검증한다.

제안 방법

뼈대 관절을 3D 좌표 공간으로 매핑하여 공간 정보를 포착한다.
공간 정보와 시간 정보를 두 개의 독립적인 스트림으로 인코딩한다.
각 스트림에 독립적으로 3D CNN 모델을 적용하여 심층 특징을 추출한다.
각 스트림을 다중 시간 버전으로 확장하여 전역 관계를 포착한다.
실험을 통해 노이즈에 대한 강건성과 스트림 간 보완 특성을 입증한다.

실험 결과

연구 질문

RQ1이중 스트림 3D CNN이 뼈대 시퀀스로부터 공간 및 시간 특징을 효과적으로 학습하여 동작 인식을 수행할 수 있는가?
RQ2공간/시간 스트림을 분리하고 다중 시간 규모로 확장하는 것이 단일 스트림 혹은 RNN 기반 방법에 비해 인식 성능을 향상시키는가?
RQ3두 스트림이 상호 보완적이며 뼈대 기반 데이터의 노이즈에 강건한가?
RQ4제안된 방법이 NTU RGB-D와 같은 대규모 데이터셋 및 SmartHome에서 어떻게 성능을 보이는가?

주요 결과

제시된 이중 스트림 3D CNN 접근법이 평가된 데이터셋에서 대부분의 RNN 기반 방법을 능가한다.
공간 정보와 시간 정보를 분리하고 3D CNN으로 처리하면 보완적 표현을 얻는다.
각 스트림을 다중 시간 버전으로 확장하면 데이터의 전역 관계를 더 잘 포착한다.
이 방법은 뼈대 시퀀스의 노이즈에 대한 강건성을 보여준다.
SmartHome 및 NTU RGB-D 데이터셋 실험에서 경쟁 방법에 비해 강한 성능을 보인다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.