QUICK REVIEW

[논문 리뷰] Video-based Human Action Recognition using Deep Learning: A Review

Hieu H. Pham, Louahdi Khoudour|arXiv (Cornell University)|2022. 08. 07.

Human Pose and Action Recognition참고 문헌 231인용 수 24

한 줄 요약

비디오 기반 인간 행동 인식을 위한 딥 러닝 기법의 포괄적 조사로, 아키텍처(CNNs, RNN-LSTMs, DBNs, SDAs), 데이터셋, 및 양적 벤치마크와 현재 도전과제를 개괄한다.

ABSTRACT

Human action recognition is an important application domain in computer vision. Its primary aim is to accurately describe human actions and their interactions from a previously unseen data sequence acquired by sensors. The ability to recognize, understand, and predict complex human actions enables the construction of many important applications such as intelligent surveillance systems, human-computer interfaces, health care, security, and military applications. In recent years, deep learning has been given particular attention by the computer vision community. This paper presents an overview of the current state-of-the-art in action recognition using video analysis with deep learning techniques. We present the most important deep learning models for recognizing human actions, and analyze them to provide the current progress of deep learning algorithms applied to solve human action recognition problems in realistic videos highlighting their advantages and disadvantages. Based on the quantitative analysis using recognition accuracies reported in the literature, our study identifies state-of-the-art deep architectures in action recognition and then provides current trends and open problems for future works in this field.

연구 동기 및 목표

비디오 기반 행동 인식을 위한 최첨단 딥 러닝 모델을 평가한다.
현실적 영상 환경에서 CNNs, RNN-LSTMs, DBNs, SDAs의 장점과 한계를 분석한다.
벤치마크 데이터셋과 이들이 딥 액션 인식의 발전에 미친 영향을 요약한다.
딥 러닝 기반 행동 인식의 향후 연구에서의 미해결 문제와 잠재적 방향을 식별한다.

제안 방법

액션 인식을 위해 사용되는 주요 딥 러닝 아키텍처(CNNs, RNN-LSTMs, DBNs, SDAs)를 검토한다.
각 아키텍처의 핵심 아이디어와 수학적 기초를 설명한다(컨볼루션, 풀링, LSTM 게이트, RBM, 오토인코더).
표준 데이터세트에서 딥 러닝 접근법의 질적 및 양적 비교를 제공한다.

실험 결과

연구 질문

RQ1비디오 기반 행동 인식에 적용되는 주요 딥 러닝 아키텍처는 무엇인가?
RQ2이들 아키텍처는 널리 사용되는 행동 인식 벤치마크에서 어떻게 성능을 보이나?
RQ3현실적인 비디오 행동 인식에 딥 러닝을 적용할 때의 현재 도전과제와 공개 문제는 무엇인가?
RQ4대규모 데이터셋과 RGB-D/스켈레톤 데이터가 모델 개발 및 평가에 어떤 영향을 미치는가?

주요 결과

CNNs는 로컬 연결, 가중치 공유, 풀링을 통해 원시 비디오 프레임에서 직접 특징 학습을 도입하여 행동 인식을 위한 엔드 투 엔드 표현 학습을 가능하게 했다.
RNN-LSTMs(양방향-LSTMs 포함)는 비디오 시퀀스의 시간적 동적성 및 맥락을 행동 분류를 위해 모델링한다.
DBNs와 SDAs는 층별 사전 학습으로 깊은 계층적 특징 표현을 제공한다; DBNs는 스택형 RBM을 사용하고 SDAs는 비지도 사전 학습을 위한 노이즈 제거 오토인코더를 사용한다.
State-of-the-art HMDB-51 결과는 RGB+optical flow 융합으로 62.0% (Wang et al., 2016) 및 two-stream CNN+SVM으로 59.4% (Simonyan et al., 2014)를 포함하여 제시된다.
실험실 제어 데이터셋(KTH, Weizmann)에서 대규모의 실제 환경 데이터셋(Sports-1M, ActivityNet, NTU RGB+D)으로의 진전은 현실적 행동 인식 도전에 대한 전환을 강조한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.