QUICK REVIEW

[论文解读] Learning Graph Convolutional Network for Skeleton-based Human Action Recognition by Neural Searching

Wei Peng, Xiaopeng Hong|arXiv (Cornell University)|Nov 11, 2019

Human Pose and Action Recognition参考文献 37被引用 27

一句话总结

该论文提出首个基于神经架构搜索（NAS）的图卷积网络（GCN），用于基于骨骼的人体动作识别，可自动发现最优图结构及高阶连接。通过结合内存与样本高效的进化策略，实现动态时空图学习与多跳切比雪夫近似，该方法在NTU RGB+D和Kinetics-Skeleton数据集上达到最先进性能。

ABSTRACT

Human action recognition from skeleton data, fueled by the Graph Convolutional Network (GCN), has attracted lots of attention, due to its powerful capability of modeling non-Euclidean structure data. However, many existing GCN methods provide a pre-defined graph and fix it through the entire network, which can loss implicit joint correlations. Besides, the mainstream spectral GCN is approximated by one-order hop, thus higher-order connections are not well involved. Therefore, huge efforts are required to explore a better GCN architecture. To address these problems, we turn to Neural Architecture Search (NAS) and propose the first automatically designed GCN for skeleton-based action recognition. Specifically, we enrich the search space by providing multiple dynamic graph modules after fully exploring the spatial-temporal correlations between nodes. Besides, we introduce multiple-hop modules and expect to break the limitation of representational capacity caused by one-order approximation. Moreover, a sampling- and memory-efficient evolution strategy is proposed to search an optimal architecture for this task. The resulted architecture proves the effectiveness of the higher-order approximation and the dynamic graph modeling mechanism with temporal interactions, which is barely discussed before. To evaluate the performance of the searched model, we conduct extensive experiments on two very large scaled datasets and the results show that our model gets the state-of-the-art results.

研究动机与目标

克服现有基于骨骼动作识别的GCN方法中固定预定义图拓扑结构的局限性。
解决主流谱系GCN中一阶切比雪夫近似带来的表征瓶颈问题。
通过在定制化的GCN搜索空间中实现自动化神经架构搜索，降低人工架构设计的工作量。
通过动态图模块建模分层特定的时空相关性，提升性能。
开发适用于大规模非欧几里得图数据（如人体骨骼）的高效搜索策略。

提出的方法

提出一种专为骨骼数据GCN设计的新型NAS框架，其搜索空间通过基于空间、时间及时空节点相关性的多种动态图模块得到丰富。
引入四阶切比雪夫多项式近似实现高阶图卷积，将感受野扩展至一阶邻居之外。
设计一种采样与内存高效的进化策略（CEIM），结合交叉熵与重要性混合方法，在连续与离散空间中优化架构搜索。
采用分层特定的动态图学习机制，为不同网络层选择不同的图生成方法，以捕捉演化的语义信息。
采用神经进化方法估计架构分布并指导搜索，无需对架构参数进行反向传播。
通过关节与骨骼模态特征的得分级融合，提升在NTU RGB+D与Kinetics-Skeleton数据集上的性能。

实验结果

研究问题

RQ1神经架构搜索能否有效发现适用于基于骨骼动作识别的最优GCN架构，超越人工设计？
RQ2与固定或共享图结构相比，引入动态分层图学习是否能提升性能？
RQ3通过切比雪夫近似实现的高阶图卷积在多大程度上增强表征能力与识别准确率？
RQ4内存与样本高效的进化策略是否能有效支持在大规模非欧几里得图数据（如人体骨骼）上的NAS？
RQ5时间相关性与时空交互在最终搜索得到的GCN架构性能中起到何种作用？

主要发现

在NTU RGB+D数据集上，NAS优化的GCN在关节模态下达到94.6%的准确率，超越此前最先进方法（2S-AGCN，93.7%），提升0.9个百分点。
在骨骼模态下，模型准确率达到94.7%，较先前最先进方法提升1.5个百分点。
在关节与骨骼模态融合下，模型在NTU RGB+D上达到95.7%的准确率，创下新的最先进基准。
在Kinetics-Skeleton数据集上，模型在关节+骨骼融合下达到37.1%的top-1准确率，超越先前最先进方法（36.1%），提升1.0个百分点。
消融实验表明，时间相关性建模与高阶切比雪夫近似显著提升性能，其中Ours(T+Cheb)在关节模态与联合模态下分别达到94.0%与95.2%的准确率。
完整NAS架构（Ours(NAS)）始终优于所有消融变体，证明了动态图与高阶模块联合搜索的有效性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。