QUICK REVIEW

[論文レビュー] A Survey on 3D Skeleton-Based Action Recognition Using Learning Method

Bin Ren, Mengyuan Liu|arXiv (Cornell University)|Feb 14, 2020

Human Pose and Action Recognition参考文献 75被引用数 86

ひとこと要約

このサーベイは3D骨格ベースのアクション認識の深層学習アプローチを包括的にレビューし、RNN、CNN、GCN、Transformersを取り上げ、NTU-RGB+DおよびNTU-RGB+D 120データセットで最先端手法を比較します。

ABSTRACT

3D skeleton-based action recognition (3D SAR) has gained significant attention within the computer vision community, owing to the inherent advantages offered by skeleton data. As a result, a plethora of impressive works, including those based on conventional handcrafted features and learned feature extraction methods, have been conducted over the years. However, prior surveys on action recognition have primarily focused on video or RGB data-dominated approaches, with limited coverage of reviews related to skeleton data. Furthermore, despite the extensive application of deep learning methods in this field, there has been a notable absence of research that provides an introductory or comprehensive review from the perspective of deep learning architectures. To address these limitations, this survey first underscores the importance of action recognition and emphasizes the significance of 3D skeleton data as a valuable modality. Subsequently, we provide a comprehensive introduction to mainstream action recognition techniques based on four fundamental deep architectures, i.e., Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), Graph Convolutional Network (GCN), and Transformers. All methods with the corresponding architectures are then presented in a data-driven manner with detailed discussion. Finally, we offer insights into the current largest 3D skeleton dataset, NTU-RGB+D, and its new edition, NTU-RGB+D 120, along with an overview of several top-performing algorithms on these datasets. To the best of our knowledge, this research represents the first comprehensive discussion of deep learning-based action recognition using 3D skeleton data.

研究の動機と目的

アクション認識の頑健なモダリティとして3D骨格データの利用を動機づける。
3D SARの深層学習アーキテクチャ（RNN、CNN、GCN、Transformers）を系統的に要約する。
骨格ベース手法におけるデータ表現、空間-時間モデリング、共起特徴を分析する。
NTU-RGB+DおよびNTU-RGB+D 120のベンチマークと洞察を提供し、今後の研究を導く。

提案手法

四つの基本的なDLアーキテクチャ（RNNs, CNNs, GCNs, Transformers）を導入し、3D SARにおける特性を比較する。
骨格データのデータ表現と前処理戦略（ジョイント/ボーングラフ、スケルトン画像、共起特徴）を議論する。
各アーキテクチャ内の代表的手法を調査し、空間-時間モデリングとアテンション機構に焦点を当てる。
グラフ構造アプローチ（ST-GCN、2s-AGCN、MS-G3D など）とトランスフォーマーベースの変種（自己アテンション、分離型アテンション）をコア技術として強調する。
データ駆動型のデータセット分析とNTU-RGB+DおよびNTU-RGB+D 120でのパフォーマンス傾向を提示する。

実験結果

リサーチクエスチョン

RQ13D骨格ベースのアクション認識に用いられる主な深層学習アーキテクチャは何で、どのように比較されるか？
RQ2RNNs、CNNs、GCNs、Transformersは空間-時間モデリングと骨格データ表現をどう扱うか？
RQ3NTU-RGB+DおよびNTU-RGB+D 120で現在トップの手法は何で、どのアーキテクチャを採用しているか？
RQ4データとしての骨格データを用いた3D SARの今後の方向性と課題は何か？

主な発見

データセット	Rank	Paper	Year	Accuracy (C-View / NTU-RGB+D)	Accuracy (C-Subject / NTU-RGB+D)	Method
NTU-RGB+D データセット	1	Wang et al. [109]	2023	98.7	94.8	Two-stream Transformer
NTU-RGB+D データセット	2	Duan et al. [23]	2022	n/a	93.2	Dynamic group GCN
NTU-RGB+D データセット	3	Liu et al. [68]	2023	96.8	92.8	Temporal decoupling GCN
NTU-RGB+D データセット	4	Zhou et al. [150]	2022	n/a	92.9	Transformer
NTU-RGB+D データセット	5	Chen et al. [14]	2021	96.8	92.4	Topology refinement GCN
NTU-RGB+D データセット	6	Zeng et al. [135]	2021	96.7	91.6	Skeletal GCN
NTU-RGB+D データセット	7	Liu et al. [74]	2020	96.2	91.5	Disentangling and unifying GCN
NTU-RGB+D データセット	8	Ye et al. [130]	2020	96.0	91.5	Dynamic GCN
NTU-RGB+D データセット	9	Shi et al. [87]	2019	96.1	89.9	Directed graph neural networks
NTU-RGB+D データセット	10	Shi et al. [88]	2018	95.1	88.5	Two-stream adaptive GCN
NTU-RGB+D データセット	11	Zhang et al. [140]	2018	95.0	89.2	LSTM based RNN
NTU-RGB+D データセット	12	Si et al. [91]	2019	95.0	89.2	AGC-LSTM(Joints&Part)
NTU-RGB+D データセット	13	Hu et al. [33]	2018	94.9	89.1	Non-local S-T + frequency attention
NTU-RGB+D データセット	14	Li et al. [51]	2019	94.2	86.8	GCN
NTU-RGB+D データセット	15	Liang et al. [57]	2019	93.7	88.6	3S-CNN + multi-task ensemble learning
NTU-RGB+D データセット	16	Song et al. [94]	2019	93.5	85.9	Richly activated GCN
NTU-RGB+D データセット	17	Zhang et al. [141]	2019	93.4	86.6	Semantics-guided GCN
NTU-RGB+D データセット	18	Xie et al. [49]	2018	93.2	82.7	RNN+CNN+Attention
NTU-RGB+D 120 データセット	1	Wang et al. [109]	2023	92.0	93.8	Two-stream Transformer
NTU-RGB+D 120 データセット	2	Xu et al. [124]	2023	n/a	91.8	Language Knowledge-Assisted
NTU-RGB+D 120 データセット	3	Zhou et al. [150]	2022	89.9	91.3	Transformer
NTU-RGB+D 120 データセット	4	Duan et al. [23]	2022	89.6	91.3	Dynamic group GCN
NTU-RGB+D 120 データセット	5	Chen et al. [14]	2021	88.9	90.6	Topology refinement GCN
NTU-RGB+D 120 データセット	6	Chen et al. [13]	2021	88.2	89.3	Spatial-Temporal GCN
NTU-RGB+D 120 データセット	7	Liu et al. [74]	2020	86.9	88.4	Disentangling and unifying GCN
NTU-RGB+D 120 データセット	8	Cheng et al. [16]	2020	85.9	87.6	Shift GCN
NTU-RGB+D 120 データセット	9	Caetano et al. [6]	2019	67.9	62.8	Tree Structure + CNN
NTU-RGB+D 120 データセット	10	Caetano et al. [7]	2019	67.7	66.9	SkeleMotion
NTU-RGB+D 120 データセット	11	Liu et al. [69]	2018	64.6	66.9	Body Pose Evolution Map
NTU-RGB+D 120 データセット	12	Ke et al. [40]	2018	62.2	61.8	Multi-Task CNN with RotClips
NTU-RGB+D 120 データセット	13	Liu et al. [64]	2017	61.2	63.3	Two-Stream Attention LSTM
NTU-RGB+D 120 データセット	14	Liu et al. [71]	2017	60.3	63.2	Skeleton Visualization (Single Stream)
NTU-RGB+D 120 データセット	15	Jun et al. [67]	2019	59.9	62.4	Online+Dilated CNN
NTU-RGB+D 120 データセット	16	Ke et al. [39]	2017	58.4	57.9	Multi-Task Learning CNN
NTU-RGB+D 120 データセット	17	Jun et al. [65]	2017	58.3	59.2	Global Context-Aware Attention LSTM
NTU-RGB+D 120 データセット	18	Jun et al. [63]	2016	55.7	57.9	Spatio-Temporal LSTM

GCNベースの手法は、骨格ベースアプローチの中で一般にリードする結果を達成する。
トランスフォーマー系は強い潜在力を示し、GCNやCNNと組み合わせたハイブリッドモデルが増えている。
最近のデータセット（NTU-RGB+D 120）は難易度が増し、アーキテクチャ横断のさらなる進展の余地を示す。
ジョイント-ボーン構造と空間-時間グラフ、適応トポロジーを捉える表現が性能向上に寄与する。
データセットと評価プロトコル（Cross-Subject, Cross-View, Cross-Setup）は3D SARモデルの公正な比較に不可欠。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。