QUICK REVIEW

[论文解读] Two-Stream 3D Convolutional Neural Network for Skeleton-Based Action Recognition

Hong Liu, Juanhui Tu|arXiv (Cornell University)|May 23, 2017

Human Pose and Action Recognition参考文献 24被引用 111

一句话总结

引入一个两流的三维卷积网络架构用于基于骨架的动作识别，证明将空间流和时间流分开并具备多时域扩展，优于在 NTU RGB-D 和 SmartHome 数据集上的多数基于 RNN 的方法。

ABSTRACT

It remains a challenge to efficiently extract spatialtemporal information from skeleton sequences for 3D human action recognition. Although most recent action recognition methods are based on Recurrent Neural Networks which present outstanding performance, one of the shortcomings of these methods is the tendency to overemphasize the temporal information. Since 3D convolutional neural network(3D CNN) is a powerful tool to simultaneously learn features from both spatial and temporal dimensions through capturing the correlations between three dimensional signals, this paper proposes a novel two-stream model using 3D CNN. To our best knowledge, this is the first application of 3D CNN in skeleton-based action recognition. Our method consists of three stages. First, skeleton joints are mapped into a 3D coordinate space and then encoding the spatial and temporal information, respectively. Second, 3D CNN models are seperately adopted to extract deep features from two streams. Third, to enhance the ability of deep features to capture global relationships, we extend every stream into multitemporal version. Extensive experiments on the SmartHome dataset and the large-scale NTU RGB-D dataset demonstrate that our method outperforms most of RNN-based methods, which verify the complementary property between spatial and temporal information and the robustness to noise.

研究动机与目标

提出从骨架序列中高效提取时空信息以进行3D动作识别的动机。
提出一个应用于骨架数据的新颖两流3D CNN框架。
通过将每个流扩展为多时态版本来增强深层特征表示。
展示对噪声的鲁棒性以及空间信息与时间信息的互补优势。
在大规模数据集（NTU RGB-D）和 SmartHome 数据集上验证该方法。

提出的方法

将骨骼关节点映射到3D坐标空间以捕捉空间信息。
将空间信息和时间信息编码到两个独立的流中。
独立地对每个流应用3D CNN模型以提取深层特征。
将每个流扩展为多时态版本以捕获全局关系。
通过实验证明对噪声的鲁棒性以及两个流之间的互补特性。

实验结果

研究问题

RQ1两流3D CNN 能否有效地从骨架序列中学习用于动作识别的时空特征？
RQ2将空间流和时间流分离并扩展到多时域尺度，是否比单流或基于RNN的方法有更好的识别性能？
RQ3两个流在骨架数据中是否互补且对噪声鲁棒？
RQ4提议的方法在如NTU RGB-D等大规模数据集以及SmartHome上的表现如何？

主要发现

两流3D CNN 方法在评估数据集上优于大多数基于RNN的方法。
将空间信息和时间信息分离并用3D CNN处理，产生互补的表示。
将每个流扩展为多时态版本有助于捕捉数据中的全局关系。
该方法对骨架序列中的噪声表现出鲁棒性。
在 SmartHome 和 NTU RGB-D 数据集上的实验显示相对于竞争方法的强劲性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。