QUICK REVIEW

[论文解读] Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition

Lei Shi, Yifan Zhang|arXiv (Cornell University)|May 20, 2018

Human Pose and Action Recognition参考文献 38被引用 23

一句话总结

该论文提出了一种用于基于骨骼动作识别的双流自适应图卷积网络（2s-AGCN），通过反向传播联合学习不同层和输入样本的最优图拓扑结构，同时通过双流架构显式建模一阶（关节坐标）和二阶（骨骼长度与方向）骨骼特征。该方法在NTU-RGBD数据集（95.1%的top-1准确率）和Kinetics-Skeleton数据集（36.1%的top-1准确率）上取得了最先进性能，显著优于先前方法。

ABSTRACT

In skeleton-based action recognition, graph convolutional networks (GCNs), which model the human body skeletons as spatiotemporal graphs, have achieved remarkable performance. However, in existing GCN-based methods, the topology of the graph is set manually, and it is fixed over all layers and input samples. This may not be optimal for the hierarchical GCN and diverse samples in action recognition tasks. In addition, the second-order information (the lengths and directions of bones) of the skeleton data, which is naturally more informative and discriminative for action recognition, is rarely investigated in existing methods. In this work, we propose a novel two-stream adaptive graph convolutional network (2s-AGCN) for skeleton-based action recognition. The topology of the graph in our model can be either uniformly or individually learned by the BP algorithm in an end-to-end manner. This data-driven method increases the flexibility of the model for graph construction and brings more generality to adapt to various data samples. Moreover, a two-stream framework is proposed to model both the first-order and the second-order information simultaneously, which shows notable improvement for the recognition accuracy. Extensive experiments on the two large-scale datasets, NTU-RGBD and Kinetics-Skeleton, demonstrate that the performance of our model exceeds the state-of-the-art with a significant margin.

研究动机与目标

为了解决现有基于GCN的骨骼动作识别模型中固定手工设计图拓扑的局限性，这些模型无法适应层次化特征学习和多样的动作模式。
通过显式建模二阶信息（如骨骼长度与方向）来提升识别性能，同时结合一阶关节坐标信息。
开发一种数据驱动的图学习机制，实现按层和按样本自适应调整拓扑结构，从而增强模型的灵活性与泛化能力。
通过在大规模基准数据集上的大量实验，证明所提出的双流框架的优越性。

提出的方法

模型采用双流架构：一条流处理一阶特征（关节坐标），另一条流处理二阶特征（表示关节间骨骼长度与方向的向量）作为输入。
图拓扑通过可微参数端到端学习，利用反向传播优化，包含两种图结构：用于共享结构模式的全局图，以及用于样本特异性关系的个体图。
自适应图卷积层在每层和每个样本上更新邻接矩阵，实现随层次化特征抽象而演化的动态拓扑学习。
通过晚期融合将双流的特征进行融合，以增强判别能力。
模型使用标准交叉熵损失进行端到端训练，图参数与卷积权重联合优化。

实验结果

研究问题

RQ1与固定的手工设计图相比，端到端学习图拓扑是否能提升基于骨骼的动作识别性能？
RQ2在仅使用一阶特征的基础上，引入二阶骨骼特征（如骨骼长度与方向）是否能带来显著的性能提升？
RQ3针对每个样本和每层的个性化、数据依赖的图结构，是否能比单一固定拓扑更好地捕捉层次化语义表示？
RQ4与单流基线相比，一阶与二阶特征的双流融合在识别准确率方面表现如何？

主要发现

所提出的2s-AGCN在NTU-RGBD数据集上达到95.1%的top-1准确率，显著超越先前最先进方法。
在Kinetics-Skeleton数据集上，该模型达到36.1%的top-1准确率，比之前最佳方法高出5.4个百分点。
消融实验表明，结合一阶与二阶特征的双流框架准确率最高（95.1%），优于单流基线（93.7%与93.2%）。
所学图的可视化显示，高层网络发展出非局部连接（如左右手之间），表明具有任务感知的拓扑自适应能力。
个体图组件为每个样本学习到不同的拓扑结构，表明最优图结构在不同动作间存在差异，且并非固定不变。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。