QUICK REVIEW

[論文レビュー] Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic Segmentation

Hui Zhou, Xinge Zhu|arXiv (Cornell University)|Aug 4, 2020

Advanced Neural Network Applications参考文献 33被引用数 138

ひとこと要約

Cylinder3Dは、円柱ベースの分割と3D畳み込み、さらに特殊ブロックを用いてLiDAR点群を直接3Dで処理することにより、SemanticKITTIの運転シーン分割で最先端を実現することを示しています。

ABSTRACT

State-of-the-art methods for large-scale driving-scene LiDAR semantic segmentation often project and process the point clouds in the 2D space. The projection methods includes spherical projection, bird-eye view projection, etc. Although this process makes the point cloud suitable for the 2D CNN-based networks, it inevitably alters and abandons the 3D topology and geometric relations. A straightforward solution to tackle the issue of 3D-to-2D projection is to keep the 3D representation and process the points in the 3D space. In this work, we first perform an in-depth analysis for different representations and backbones in 2D and 3D spaces, and reveal the effectiveness of 3D representations and networks on LiDAR segmentation. Then, we develop a 3D cylinder partition and a 3D cylinder convolution based framework, termed as Cylinder3D, which exploits the 3D topology relations and structures of driving-scene point clouds. Moreover, a dimension-decomposition based context modeling module is introduced to explore the high-rank context information in point clouds in a progressive manner. We evaluate the proposed model on a large-scale driving-scene dataset, i.e. SematicKITTI. Our method achieves state-of-the-art performance and outperforms existing methods by 6% in terms of mIoU.

研究の動機と目的

運転シーン LiDAR 分割における2D投影の限界を評価し、3D処理の利点を定量化する。
有効な構成を特定するため、2Dと3D空間における点表現とバックボーンを調査する。
運転シーンの点群に適した円柱ベースのボクセル化と3D CNNバックボーンを開発する。
効率と文脈モデリングを改善するため、不対称残差ブロックと次元分解ベースの文脈モデリングモジュールを導入する。

提案手法

円柱座標系でLiDAR点をボクセル化する3D円柱分割を導入し、トポロジを保持する3D表現を生成する。
スパース3D畳み込みを用いた3D U-Netバックボーンで円柱ベース表現を処理する。
従来の残差ブロックを不対称残差ブロックに置換し、立方体状の運転物体により適合させ計算を削減する。
高ランクの文脈を三つの低ランク成分（3x1x1、1x3x1、1x1x3）に分解して統合する次元分解ベースの文脈モデリング（DDCM）モジュールを取り付ける。
点ごとの精度とmIoUを最適化するため、重み付きクロスエントロピー損失とLovász-softmax損失で訓練する。
初期学習率0.001のAdamオプティマイザを使用する。

実験結果

リサーチクエスチョン

RQ1円柱分割を介して3DでLiDARデータを処理することは、運転シーンのセグメンテーションにおいて2D投影ベースの表現を上回るか？
RQ2外部のシーンの3Dトポロジーを捉える際、円柱ベースのボクセル化はデカルト座標のボクセル化とどう異なるか？
RQ3不対称残差ブロックと次元分解文脈モデリングが全体性能に及ぼす寄与は何か？
RQ4先行法と比べてSemanticKITTIでCylinder3Dが達成した性能向上は何か？

主な発見

Cylinder3DはSemanticKITTIは最先端の性能を達成し、mIoUで既存手法を著しく上回る。
3D円柱分割と3D畳み込みは、投影ベース表現に適用された2Dバックボーンより結果を著しく改善する。
標準的な残差ブロックを不対称残差ブロックに置換すると約1.5%のmIoUゲイン。
次元分解文脈モデリングモジュールを組み込むとさらにmIoUが向上し、アブレーション結果は顕著な向上を示す。
フリップテストは、複数の拡張予測をアンサンブルした際に追加の小さなmIoU向上をもたらす。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。