QUICK REVIEW

[论文解读] UNIK: A Unified Framework for Real-world Skeleton-based Action Recognition

Di Yang, Yaohui Wang|arXiv (Cornell University)|Jul 19, 2021

Human Pose and Action Recognition参考文献 44被引用 24

一句话总结

UNIK 在骨架数据上学习时空依赖关系，使用均匀初始化的依赖矩阵和多头注意力，从而实现强大的跨数据集泛化，特别是在对真实世界视频的 Posetics 进行预训练时。

ABSTRACT

Action recognition based on skeleton data has recently witnessed increasing attention and progress. State-of-the-art approaches adopting Graph Convolutional networks (GCNs) can effectively extract features on human skeletons relying on the pre-defined human topology. Despite associated progress, GCN-based methods have difficulties to generalize across domains, especially with different human topological structures. In this context, we introduce UNIK, a novel skeleton-based action recognition method that is not only effective to learn spatio-temporal features on human skeleton sequences but also able to generalize across datasets. This is achieved by learning an optimal dependency matrix from the uniform distribution based on a multi-head attention mechanism. Subsequently, to study the cross-domain generalizability of skeleton-based action recognition in real-world videos, we re-evaluate state-of-the-art approaches as well as the proposed UNIK in light of a novel Posetics dataset. This dataset is created from Kinetics-400 videos by estimating, refining and filtering poses. We provide an analysis on how much performance improves on smaller benchmark datasets after pre-training on Posetics for the action classification task. Experimental results show that the proposed UNIK, with pre-training on Posetics, generalizes well and outperforms state-of-the-art when transferred onto four target action classification datasets: Toyota Smarthome, Penn Action, NTU-RGB+D 60 and NTU-RGB+D 120.

研究动机与目标

推动基于骨架的动作识别对不同的人体拓扑结构和真实世界噪声具有鲁棒性。
提出一个拓扑无关的框架，使其能够在具有不同关节配置的数据集之间泛化。
研究基于骨架的模型在跨域到真实世界视频上的迁移能力。
引入 Posetics 作为一个用于预训练的大规模真实世界骨架数据集。
证明在 Posetics 上的预训练能提升下游真实世界基准的性能。

提出的方法

将均匀分布的依赖矩阵初始化以替代固定的骨架拓扑。
应用多头聚合从均匀初始化中学习多张依赖映射。
使用 Spatial Long-short Dependency Unit (S-LSU) 和 Temporal Long-short Dependency Unit (T-LSU) 来捕捉多尺度时空特征。
引入自注意力机制以按动作动态调节依赖矩阵。
采用两流融合（关节特征与骨骼特征）进行最终动作预测。
在 Posetics 上对 UNIK 主干进行预训练，并在目标数据集上进行微调以评估迁移能力。

实验结果

研究问题

RQ1在不依赖预定义人体拓扑的情况下，UNIK 是否能实现有效的基于骨架的动作识别？
RQ2在真实世界大规模骨架数据集（Posetics）上的预训练是否会改善对其他真实世界基准的跨数据集迁移？
RQ3在跨域迁移设置下，UNIK 与最先进的基于 GCN 的方法相比如何？
RQ4多头注意力及头数对泛化能力和准确性的影响是什么？

主要发现

在跨被试基准上，采用均匀依赖初始化和多头注意力的 UNIK 优于固定图 ST-GCN 基线。
在 Posetics 上对 UNIK 进行预训练，在迁移到 Smarthome 和 Penn Action 时显著提升性能，并且在 NTU-60/120 上保持竞争力。
在多个真实世界数据集上，采用 Posetics 预训练的 UNIK 实现了最先进或具有竞争力的结果，展示了强泛化能力。
增大头数（N）可以提高数据集特定的性能，但可能损害跨数据集泛化，选择 N=3 作为平衡设置。
关节+骨骼两流融合在性能上进一步提升，特别是在 Posetics 预训练时。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。