Skip to main content
QUICK REVIEW

[论文解读] Skeleton-based Human Action Recognition via Convolutional Neural Networks (CNN)

Ayman Ali, Ekkasit Pinyoanuntapong|arXiv (Cornell University)|Jan 31, 2023
Human Pose and Action Recognition被引用 9
一句话总结

这篇论文表明在使用适当的训练技巧、数据增强和基于边界的余弦损失时,CNNs 可以匹配用于基于骨架的动作识别的最先进 GCNs,在 NTU-60 上达到 95%。

ABSTRACT

Recently, there has been a remarkable increase in the interest towards skeleton-based action recognition within the research community, owing to its various advantageous features, including computational efficiency, representative features, and illumination invariance. Despite this, researchers continue to explore and investigate the most optimal way to represent human actions through skeleton representation and the extracted features. As a result, the growth and availability of human action recognition datasets have risen substantially. In addition, deep learning-based algorithms have gained widespread popularity due to the remarkable advancements in various computer vision tasks. Most state-of-the-art contributions in skeleton-based action recognition incorporate a Graph Neural Network (GCN) architecture for representing the human body and extracting features. Our research demonstrates that Convolutional Neural Networks (CNNs) can attain comparable results to GCN, provided that the proper training techniques, augmentations, and optimizers are applied. Our approach has been rigorously validated, and we have achieved a score of 95% on the NTU-60 dataset

研究动机与目标

  • 将基于骨架的动作识别定位为计算上高效的模态,具备辨识特征。
  • 探讨在该领域中 CNN 是否能够实现与基于 GCN 的方法相竞争的性能。
  • 评估多样化数据增强和优化策略对泛化与鲁棒性的影响。
  • 证明基于 margin 的余弦损失可以改进相较于传统交叉熵的判别特征学习。

提出的方法

  • 将骨架序列编码为骨架映射图,以使 CNN 能处理时空信息。
  • 应用受 RandAugment 和姿态特定技术启发的多样化图像基和骨架基数据增强。
  • 使用基于 margin 的余弦损失(AAML 启发)替代标准交叉熵,以增加类间分离。
  • 尝试优化器(MadGrad)和学习率调度器(Cosine Annealing + ReducedLR)以改善收敛和泛化。
  • 正则化技术(标签平滑、 dropout、批量归一化、提前停止)以减轻过拟合。
Figure 1: Action representation from NTU-D 60 dataset A) -45°skeleton visualization, B) 0 °skeleton visualization, C) 45°skeleton visualization. (D, E, F) are the transformed skeleton for the same skeletons in (A, B, C)
Figure 1: Action representation from NTU-D 60 dataset A) -45°skeleton visualization, B) 0 °skeleton visualization, C) 45°skeleton visualization. (D, E, F) are the transformed skeleton for the same skeletons in (A, B, C)

实验结果

研究问题

  • RQ1当配备强有力的训练和增强策略时,CNN 是否能够在骨架基于动作识别中达到与 GCN 基方法竞争的准确率?
  • RQ2各种增强技术对 CNN 基骨架动作识别器的泛化和鲁棒性有何影响?
  • RQ3基于 margin 的余弦损失是否在骨架动作识别中相较于交叉熵提升判别性能?
  • RQ4哪些优化器与学习率调度组合能为基于 CNN 的骨架动作模型带来最佳性能?

主要发现

  • 具备适当训练技巧的 CNN 基模型在骨架基动作识别上的结果接近最先进方法,甚至可与 GCN 方法相媲美。
  • 数据增强(图像基和骨架基)显著提高在各种变体下的泛化与鲁棒性。
  • 使用基于 margin 的余弦损失(类似 ArcFace)相较于交叉熵损失带来显著性能提升。
  • MadGrad 优化器配合余弦退火和 ReducedLR 调度器提升训练稳定性和准确性。
  • 正则化技术有助于在未见数据上的更好泛化。
Figure 2: The pipeline of generating the skeleton map image
Figure 2: The pipeline of generating the skeleton map image

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。