QUICK REVIEW

[论文解读] PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences

Hehe Fan, Xin Yu|arXiv (Cornell University)|May 27, 2022

Human Pose and Action Recognition参考文献 58被引用 69

一句话总结

PSTNet 引入点基于时空卷积，解耦点云的空间与时间，形成层次网络用于动态点云的3D动作识别和4D语义分割。

ABSTRACT

Point cloud sequences are irregular and unordered in the spatial dimension while exhibiting regularities and order in the temporal dimension. Therefore, existing grid based convolutions for conventional video processing cannot be directly applied to spatio-temporal modeling of raw point cloud sequences. In this paper, we propose a point spatio-temporal (PST) convolution to achieve informative representations of point cloud sequences. The proposed PST convolution first disentangles space and time in point cloud sequences. Then, a spatial convolution is employed to capture the local structure of points in the 3D space, and a temporal convolution is used to model the dynamics of the spatial regions along the time dimension. Furthermore, we incorporate the proposed PST convolution into a deep network, namely PSTNet, to extract features of point cloud sequences in a hierarchical manner. Extensive experiments on widely-used 3D action recognition and 4D semantic segmentation datasets demonstrate the effectiveness of PSTNet to model point cloud sequences.

研究动机与目标

在不进行体素化或跟踪的情况下，激励对动态、不规则点云的建模。
提出 PST 卷积以在点序列中解耦空间结构与时间动态。
构建 PSTNet 架构用于序列级分类和点级预测。
在3D动作识别和4D语义分割基准上展示有效性。

提出的方法

在点云序列中解耦空间和时间并定义 PST 卷积。
在局部3D邻域上进行空间卷积，使用一个学习的基于位移的核函数 f(delta; theta)。
对局部帧序列进行时间卷积以捕捉动力学。
构建点管道（point tubes）与时间锚帧和基于 FPS 的空间锚点，以实现时空卷积。
引入 PST 转置卷积，以上采样并插值特征以实现密集点级预测。
组装包含多个 PST 层（以及转置层）的 PSTNet 架构，用于动作识别和语义分割。

实验结果

研究问题

RQ1在解耦空间结构与时间动态是否能改善对动态点云的学习？
RQ2与先前方法相比，PSTNet 在3D动作识别和4D语义分割是否具有更高的准确性和效率？
RQ3时间卷积核大小和空间半径如何影响点云序列任务的性能？

主要发现

方法	输入	帧数	准确率（%）
Vieira et al.	depth	20	78.20
Kläser et al.	depth	18	81.43
Actionlet	skeleton	all	88.21
PointNet++	point	1	61.61
MeteorNet	point	4	78.11
MeteorNet	point	8	81.14
MeteorNet	point	12	86.53
MeteorNet	point	16	88.21
MeteorNet	point	24	88.50
PSTNet (ours)	point	4	81.14
PSTNet (ours)	point	8	83.50
PSTNet (ours)	point	12	87.88
PSTNet (ours)	point	16	89.90
PSTNet (ours)	point	24	91.20

PSTNet 在 MSR-Action3D 上达到最前沿结果，在最多24帧的帧设置中超越先前方法。
在 NTU RGB+D 60/120 上，PSTNet 显示出对骨架、深度和体素基线的显著改进。
在 Synthia 4D 的4D语义分割中，具有时间建模（l=3）的 PSTNet 超越基线，且参数量比某些竞争者更少。
消融表明更长的剪辑和合适的时间核大小可以提升动作识别，而空间半径在局部结构捕获与辨别性之间权衡。
可视化显示 PSTNet 在移动区域更强地激活，表明对运动建模有效。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。