QUICK REVIEW

[论文解读] Pyramid Self-attention Polymerization Learning for Semi-supervised Skeleton-based Action Recognition

Binqian Xu, Xiangbo Shu|arXiv (Cornell University)|Feb 5, 2023

Human Pose and Action Recognition被引用 41

一句话总结

PSP Learning 通过金字塔自注意聚合和粗到细对比学习，在半监督动作识别中联合学习身体级、部位级和关节级的骨架表示。它在 NTU RGB+D 和 NW-UCLA 数据集上具有竞争力的表现。

ABSTRACT

Most semi-supervised skeleton-based action recognition approaches aim to learn the skeleton action representations only at the joint level, but neglect the crucial motion characteristics at the coarser-grained body (e.g., limb, trunk) level that provide rich additional semantic information, though the number of labeled data is limited. In this work, we propose a novel Pyramid Self-attention Polymerization Learning (dubbed as PSP Learning) framework to jointly learn body-level, part-level, and joint-level action representations of joint and motion data containing abundant and complementary semantic information via contrastive learning covering coarse-to-fine granularity. Specifically, to complement semantic information from coarse to fine granularity in skeleton actions, we design a new Pyramid Polymerizing Attention (PPA) mechanism that firstly calculates the body-level attention map, part-level attention map, and joint-level attention map, as well as polymerizes these attention maps in a level-by-level way (i.e., from body level to part level, and further to joint level). Moreover, we present a new Coarse-to-fine Contrastive Loss (CCL) including body-level contrast loss, part-level contrast loss, and joint-level contrast loss to jointly measure the similarity between the body/part/joint-level contrasting features of joint and motion data. Finally, extensive experiments are conducted on the NTU RGB+D and North-Western UCLA datasets to demonstrate the competitive performance of the proposed PSP Learning in the semi-supervised skeleton-based action recognition task. The source codes of PSP Learning are publicly available at https://github.com/1xbq1/PSP-Learning.

研究动机与目标

在骨架数据中超越仅关节级表示，充分利用粗到细的语义信息的动机。
提出一个 Pyramid Polymerizing Attention 机制，以自上而下融合来自粗到细层次的身体/部位/关节注意力。
引入一个 Coarse-to-fine Contrastive Loss，以对齐关节和运动模态之间的身体/部位/关节级特征。
开发一个端到端的半监督框架，使用有标签和无标签的骨架数据共同训练。
在公开数据集 NTU RGB+D 和 Northwestern-UCLA 上通过消融和对比验证该方法。

提出的方法

将原始骨架序列转换为关节数据和运动数据，以用于关节编码器和运动编码器。
构建一个 Skeleton Pyramid，以从关节/运动表示中获取身体级、部位级和关节级特征。
应用 Pyramid Polymerizing Attention，将从身体到部位到关节的注意力图聚合，并生成相应的聚合特征。
定义含有身体、部位和关节分支的 Coarse-to-fine Contrastive Loss，以对比关节表示与运动表示。
在无标签数据上使用对比损失的组合以及在有标签数据上使用分类损失（交叉熵）进行训练。

实验结果

研究问题

RQ1与仅关节表示相比，粗到细（身体/部位/关节）表示是否能提高半监督骨架动作识别的效果？
RQ2Pyramid Polymerizing Attention 是否能够有效融合多层级的语义信息，从而产生更适合对比学习的特征？
RQ3Coarse-to-fine Contrastive Loss 在多粒度下对齐关节与运动模态的特征有何影响？
RQ4在标准骨架数据集 NTU RGB+D 和 NW-UCLA 下，在部分标注条件下提出的方法是否鲁棒且具有竞争力？

主要发现

PSP Learning 在半监督设置下在 NTU RGB+D 和 NW-UCLA 上具有竞争力的表现。
Pyramid Polymerizing Attention 机制能够从粗到细的粒度有效地结合身体、部位和关节级信息。
Coarse-to-fine Contrastive Loss 共同在关节与运动模态之间的身体、部位和关节级特征上施加强对比約束。
该框架在半监督骨架动作识别中受益于多粒度对比学习的作用。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。