QUICK REVIEW

[論文レビュー] PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences

Hehe Fan, Xin Yu|arXiv (Cornell University)|May 27, 2022

Human Pose and Action Recognition参考文献 58被引用数 69

ひとこと要約

PSTNet は、動的な点雲の空間と時間を切り離す点ベースの時空畳み込みを導入し、3D アクション認識と 4D セマンティックセグメンテーションのための階層型ネットワークを形成します。

ABSTRACT

Point cloud sequences are irregular and unordered in the spatial dimension while exhibiting regularities and order in the temporal dimension. Therefore, existing grid based convolutions for conventional video processing cannot be directly applied to spatio-temporal modeling of raw point cloud sequences. In this paper, we propose a point spatio-temporal (PST) convolution to achieve informative representations of point cloud sequences. The proposed PST convolution first disentangles space and time in point cloud sequences. Then, a spatial convolution is employed to capture the local structure of points in the 3D space, and a temporal convolution is used to model the dynamics of the spatial regions along the time dimension. Furthermore, we incorporate the proposed PST convolution into a deep network, namely PSTNet, to extract features of point cloud sequences in a hierarchical manner. Extensive experiments on widely-used 3D action recognition and 4D semantic segmentation datasets demonstrate the effectiveness of PSTNet to model point cloud sequences.

研究の動機と目的

動的で不規則な点雲をボクセル化や追跡なしにモデリングすることを動機づける。
点列の時間ダイナミクスから空間構造を切り離す PST 演算を提案する。
シーケンスレベル分類と点レベル予測のための PSTNet アーキテクチャを構築する。
3D アクション認識と 4D セマンティックセグメンテーションのベンチマークで有効性を示す。

提案手法

点雲シーケンスで空間と時間を切り離し PST 演算を定義。
学習された変位ベースのカーネル関数 f(delta; theta) による局所3D近傍での空間畳み込み。
ローカルフレームの列に対する時間畳み込みでダイナミクスを捉える。
時間アンカーフレームと FPS ベースの空間アンカーを用いて時空畳み込みを可能にする「点チューブ」を構築。
特徴をアップサンプル・補間する PST 転置畳み込みを導入して密な点レベル予測を行う。
動作認識とセマンティックセグメンテーションのために複数の PST 層（および転置層）を持つ PSTNet アーキテクチャを構築。

実験結果

リサーチクエスチョン

RQ1空間構造と時間ダイナミクスの切り離しは動的点雲の学習を改善するか？
RQ2PSTNet は従来法と比べて 3D アクション認識と 4D セマンティックセグメンテーションで精度と効率が優れているか？
RQ3時間核サイズと空間半径は点雲シーケンスタスクの性能にどう影響するか？

主な発見

方法	入力	フレーム	精度（％）
Vieira et al.	depth	20	78.20
Kläser et al.	depth	18	81.43
Actionlet	skeleton	all	88.21
PointNet++	point	1	61.61
MeteorNet	point	4	78.11
MeteorNet	point	8	81.14
MeteorNet	point	12	86.53
MeteorNet	point	16	88.21
MeteorNet	point	24	88.50
PSTNet (ours)	point	4	81.14
PSTNet (ours)	point	8	83.50
PSTNet (ours)	point	12	87.88
PSTNet (ours)	point	16	89.90
PSTNet (ours)	point	24	91.20

PSTNet は MSR-Action3D で最先端の結果を達成し、フレーム設定を最大 24 フレームまで超える。
NTU RGB+D 60/120 で skeleton-, depth-, voxel ベースのベースラインより強い改善を示す。
Synthia 4D の 4D セマンティックセグメンテーションで時間モデリング（l=3）を持つ PSTNet はベースラインを上回り、競合他社よりも少ないパラメータを使用する。
アブレーションして長めのクリップと適切な時間カーネルサイズがアクション認識を改善し、空間半径は局所構造の捉え方と識別性のトレードオフを生じる。
可視化は移動領域でより強く活性化することを示し、運動モデリングが効果的であることを示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。