QUICK REVIEW

[論文レビュー] PointPWC-Net: A Coarse-to-Fine Network for Supervised and Self-Supervised Scene Flow Estimation on 3D Point Clouds

Wenxuan Wu, Zhiyuan Wang|arXiv (Cornell University)|Nov 27, 2019

Advanced Vision and Imaging参考文献 86被引用数 66

ひとこと要約

PointPWC-Net は、2つの連続した3D点群からのシーンフローを推定する、学習可能な点ベースのコストボリュームと粗さ→細さのフレームワークを導入し、監督付きおよび自己教師付き訓練を可能にし、KITTI への強い一般化を実現します。

ABSTRACT

We propose a novel end-to-end deep scene flow model, called PointPWC-Net, on 3D point clouds in a coarse-to-fine fashion. Flow computed at the coarse level is upsampled and warped to a finer level, enabling the algorithm to accommodate for large motion without a prohibitive search space. We introduce novel cost volume, upsampling, and warping layers to efficiently handle 3D point cloud data. Unlike traditional cost volumes that require exhaustively computing all the cost values on a high-dimensional grid, our point-based formulation discretizes the cost volume onto input 3D points, and a PointConv operation efficiently computes convolutions on the cost volume. Experiment results on FlyingThings3D outperform the state-of-the-art by a large margin. We further explore novel self-supervised losses to train our model and achieve comparable results to state-of-the-art trained with supervised loss. Without any fine-tuning, our method also shows great generalization ability on KITTI Scene Flow 2015 dataset, outperforming all previous methods.

研究の動機と目的

3D point clouds の大きなモーションを伴うシーンフロー推定を、直接的に正確に行えるよう Motivate.
dense 4D テンソルを用いず、point clouds 上で動作する learnable cost volume を開発する。
大きなモーションに効率的に対処するため、warping と upsampling を含む coarse-to-fine アーキテクチャを利用する。
ground-truth のシーンフローラベルなしでモデルを訓練する自己教師付き losses を導入する。
FlyingThings3D および KITTI Scene Flow 2015 において最先端の性能を実証し、zero-shot generalization を強化する。

提案手法

方向ベクトルと結合特徴に基づく MLP を用いて point-to-patch コストを計算する新規の learnable cost volume layer を導入する。
入力点上でコストボリュームを離散化し、patch-to-patch の方式で PointConv によるコストを集約する。
furthest point sampling と PointConv を用いて各点群の特徴階層を 4 レベルの feature pyramid に拡張する。
upsampling と warping を用いた coarse-to-fine フレームワークを実装：初期フローをアップサンプルし、最初の点群をwarping して、各レベルでコストボリュームを計算し、改良されたフローを予測する。
Chamfer 距離、平滑性、ラプラシアン正則化を組み合わせた自己教師付き losses を使用して ground-truth のフローラベルなしで訓練する。
初期点群の特徴、コストボリューム、アップサンプルされたフローを消費して、より細かなシーンフローを推定するフロー予測器を提供する。

実験結果

リサーチクエスチョン

RQ1学習可能な点ベースのコストボリュームは伝統的または格子ベースのコストボリュームを超えるパフォーマンスを3D点群上のシーンフロー推定で発揮するのか？
RQ2コース-ツー-ファイン warping アプローチは exhaustive search なしに点群の大きなモーションを頑健に扱えるのか？
RQ3 Chamfer、平滑性、ラプラシアンを用いた自己教師付き losses は ground-truth ラベルなしで競争力のある点群シーンフローモデルを訓練できるのか？
RQ4PointPWC-Net は fine-tuning なしで real-world KITTI Scene Flow データへどの程度 generalize するのか？
RQ5コストボリューム設計、 warping、 upsampling の各要素の寄与を示すアブレーションはどのような結果になるのか？

主な発見

データセット	手法	監修	EPE3D(m)↓	Acc3DS↑	Acc3DR↑	Outliers3D↓	EPE2D(px)↓	Acc2D↑
FlyingThings3D	ICP(rigid)	Self	0.4062	0.1614	0.3038	0.8796	23.2280	0.2913
FlyingThings3D	FGR(rigid)	Self	0.4016	0.1291	0.3461	0.8755	28.5165	0.3037
FlyingThings3D	CPD(non-rigid)	Self	0.4887	0.0538	0.1694	0.9063	26.2015	0.0966
FlyingThings3D	PointPWC-Net	Self	0.1213	0.3239	0.6742	0.6878	6.5493	0.4756
FlyingThings3D	FlowNet3D	Full	0.1136	0.4125	0.7706	0.6016	5.9740	0.5692
FlyingThings3D	SPLATFlowNet	Full	0.1205	0.4197	0.7180	0.6187	6.9759	0.5512
FlyingThings3D	original BCL	Full	0.1111	0.4279	0.7551	0.6054	6.3027	0.5669
FlyingThings3D	HPLFlowNet	Full	0.0804	0.6144	0.8555	0.4287	4.6723	0.6764
FlyingThings3D	PointPWC-Net	Full	0.0588	0.7379	0.9276	0.3424	3.2390	0.7994
KITTI	ICP(rigid)	Self	0.5181	0.0669	0.1667	0.8712	27.6752	0.1056
KITTI	FGR(rigid)	Self	0.4835	0.1331	0.2851	0.7761	18.7464	0.2876
KITTI	CPD(non-rigid)	Self	0.4144	0.2058	0.4001	0.7146	27.0583	0.1980
KITTI	PointPWC-Net(w/o ft)	Self	0.2549	0.2379	0.4957	0.6863	8.9439	0.3299
KITTI	PointPWC-Net(w/ ft)	Self+Self	0.0461	0.7951	0.9538	0.2275	2.0417	0.8645
KITTI	FlowNet3D	Full	0.1767	0.3738	0.6677	0.5271	7.2141	0.5093
KITTI	SPLATFlowNet	Full	0.1988	0.2174	0.5391	0.6575	8.2306	0.4189
KITTI	original BCL	Full	0.1729	0.2516	0.6011	0.6215	7.3476	0.4411
KITTI	HPLFlowNet	Full	0.1169	0.4783	0.7776	0.4103	4.8055	0.5938
KITTI	PointPWC-Net(w/o ft)	Full	0.0694	0.7281	0.8884	0.2648	3.0062	0.7673
KITTI	PointPWC-Net(w/ ft)	Full+Self	0.0430	0.8175	0.9680	0.2072	1.9022	0.8669
KITTI	Self+Self	Full	0.0430	0.8175	0.9680	0.2072	1.9022	0.8669

自己教師付き losses を用いた PointPWC-Net は FlyingThings3D で ground-truth 監視なしでも競争力のある性能を達成する。
FlyingThings3D では PointPWC-Net (Full) が EPE3D 0.0588 を達成し、EPE3D、Acc3DS、Acc3DR、Outliers3D の各指標で複数のベースラインを大きく上回る。
KITTI Scene Flow 2015 に fine-tuning なしで、PointPWC-Net (Self) は EPE3D 0.2549m に達し、FGR および CPD のベースラインを上回り強い一般化を示す。
FlyingThings3D での supervised pretraining → KITTI fine-tuning (Self または Self+Self) により、PointPWC-Net は KITTI で EPE3D を 5 cm 未満に抑え、複数の指標で従来法を上回る。
アブレーション研究は、学習可能なコストボリュームと warping が、従来のコストボリュームや warping なしのベースラインと比較して性能を著しく改善することを示す。
提案された losses を用いた自己教師付き訓練は、 KITTI で ground-truth ラベルなしでも競争力のある結果を実現し、監視付き性能に近づくこともある。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。