QUICK REVIEW

[论文解读] Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition

Pengfei Zhang, Cuiling Lan|arXiv (Cornell University)|Apr 2, 2019

Human Pose and Action Recognition参考文献 61被引用 42

一句话总结

SGN 将关节类型语义与帧索引语义引入到分层模型中，通过关节层面的 GCN 和帧层面的 CNN，在参数显著更少的情况下，在 NTU60/NTU120/SYSU 上达到最先进的准确率。

ABSTRACT

Skeleton-based human action recognition has attracted great interest thanks to the easy accessibility of the human skeleton data. Recently, there is a trend of using very deep feedforward neural networks to model the 3D coordinates of joints without considering the computational efficiency. In this paper, we propose a simple yet effective semantics-guided neural network (SGN) for skeleton-based action recognition. We explicitly introduce the high level semantics of joints (joint type and frame index) into the network to enhance the feature representation capability. In addition, we exploit the relationship of joints hierarchically through two modules, i.e., a joint-level module for modeling the correlations of joints in the same frame and a framelevel module for modeling the dependencies of frames by taking the joints in the same frame as a whole. A strong baseline is proposed to facilitate the study of this field. With an order of magnitude smaller model size than most previous works, SGN achieves the state-of-the-art performance on the NTU60, NTU120, and SYSU datasets. The source code is available at https://github.com/microsoft/SGN.

研究动机与目标

以易获取的三维关节为基础，推动基于骨架的动作识别。
明确融入高级关节语义（关节类型和帧索引）以提升特征表示。
在分层框架中用 GCN 建模关节级相关性，并用 CNN 建模帧级依赖。
提供一个轻量级的强基线并在参数更少的情况下实现最先进的性能。

提出的方法

将关节位置和速度嵌入统一的动力学表示中，然后与关节语义融合。
使用基于内容自适应图的关节层 GCN，利用关节动力学和关节类型语义来建模帧内关系。
构建一个帧级模块，包含帧索引语义并对关节进行空间池化，随后通过时域 CNN 捕捉帧间动态。
使用帧索引和关节类型嵌入来丰富节点和帧的表示。
开发一个不含语义的轻量级强基线来评估性能，其中包含数据增强和池化策略。
在 NTU60、NTU120 和 SYSU 数据集上将 SGN 与最新方法进行对比，报告参数效率。

实验结果

研究问题

RQ1显式建模关节类型语义是否能改善骨架数据的图结构构建和 GCN 的消息传递？
RQ2在帧级引入帧索引语义是否提升时序建模和动作分类的准确性？
RQ3相比非层次结构或全局方法，分层的关节级和帧级架构在骨架动作识别中是否更有效？
RQ4与更重的最先进模型相比，带有轻量基线的 SGN 成绩如何？

主要发现

SGN 在 NTU60 CS（89.0%）和 CV（94.5%）设置下达到最先进的结果。
SGN 比其无语义的基线在 CS 提升 2.1%，在 CV 提升 1.7%。
帧索引语义在时序卷积受限时尤其提升性能，并在与时序卷积核共同使用时带来额外增益。
对帧内关节相关性（关节层级）和帧间相关性（帧层级）的分层建模在精度上优于非层次结构或全局图方法。
带有语义的 SGN 使用的参数量比多数方法少一个数量级，同时实现具有竞争力甚至更高的准确度。
一个不含语义的强力轻量基线在数据增强和对关节进行最大池化时收益显著，凸显了效率提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。