QUICK REVIEW

[論文レビュー] StreetSurf: Extending Multi-view Implicit Surface Reconstruction to Street Views

Jianfei Guo, Nianchen Deng|arXiv (Cornell University)|Jun 8, 2023

Computer Graphics and Visualization Techniques被引用数 21

ひとこと要約

StreetSurfはLiDARなしで未境界の街路データに対する多視点Implicit表面再構成を実現する。近距離と遠距離ビューのジオメトリを cuboid hash-grids と monocular priors を用いて分離し、控えめな計算量で最先端のジオメトリと外観を達成する。

ABSTRACT

We present a novel multi-view implicit surface reconstruction technique, termed StreetSurf, that is readily applicable to street view images in widely-used autonomous driving datasets, such as Waymo-perception sequences, without necessarily requiring LiDAR data. As neural rendering research expands rapidly, its integration into street views has started to draw interests. Existing approaches on street views either mainly focus on novel view synthesis with little exploration of the scene geometry, or rely heavily on dense LiDAR data when investigating reconstruction. Neither of them investigates multi-view implicit surface reconstruction, especially under settings without LiDAR data. Our method extends prior object-centric neural surface reconstruction techniques to address the unique challenges posed by the unbounded street views that are captured with non-object-centric, long and narrow camera trajectories. We delimit the unbounded space into three parts, close-range, distant-view and sky, with aligned cuboid boundaries, and adapt cuboid/hyper-cuboid hash-grids along with road-surface initialization scheme for finer and disentangled representation. To further address the geometric errors arising from textureless regions and insufficient viewing angles, we adopt geometric priors that are estimated using general purpose monocular models. Coupled with our implementation of efficient and fine-grained multi-stage ray marching strategy, we achieve state of the art reconstruction quality in both geometry and appearance within only one to two hours of training time with a single RTX3090 GPU for each street view sequence. Furthermore, we demonstrate that the reconstructed implicit surfaces have rich potential for various downstream tasks, including ray tracing and LiDAR simulation.

研究の動機と目的

非物体中心で長く狭い軌跡を持つ撮影で、未境界の街路ビューから3Dジオメトリを再構成するという課題に取り組む。
近距離ジオメトリと遠距離ビュー・空を分離し、モデルキャパシティを効果的に割り当てる。
モノキュラ手がかりと路面初期化からの幾何プライヤを取り入れ、最適化を安定化させる。
マルチステージの占有率主導のレイマーチング戦略で効率を改善する。
dense LiDAR に頼らず Waymo など実世界の自動運転データセットで再構成品質を示す。

提案手法

近距離(cr)、遠距離ビュー(dv)、空の3つの部分にシーンを分割し、それぞれ cr に cuboid NeuS、dv に hyper-cuboid NeRF++、空に方向性MLPを用いてモデリングする。
Ray に沿ったサンプリングを適切に行い、ジオメトリと外観を表現するために整列した cuboid 境界と cuboid/hyper-cuboid hash-grids を使用する。
占有グリッドと多段階階層サンプリングを組み合わせたマルチシェルレイマーチング戦略で、効率と細部の両立を図る。
正しい分離を促すため、路面 priors で close-range SDF を初期化し、退化解を防ぐエントロピー正則化を適用する。
LiDAR が利用できない場合は単眼 priors（法線と深度）を活用し、任意の空マスクを補助的に用いてジオメトリと外観を導く。
フォトメトリック、ジオメトリ、マスク、エントロピー、アイコナル、スパーシティの損失でエンドツーエンド訓練を行い、オプションでカメラ姿勢を refine し、フレームごとの外観埋め込みを使用する。

(a) NeRF-360: Unbounded scene with object-centric camera trajectories. Using spherical bounds and spherical/cubic model.

実験結果

リサーチクエスチョン

RQ1街路ビューのデータをどのように分割・モデリングして、非物体中心の無限に近いカメラ軌跡を扱えるようにするか。
RQ2近距離ジオメトリと遠距離ビューのジオメトリを監督なしで分離できるか、初期化と priors はこの分離にどのように影響するか。
RQ3テクスチャが乏しい領域や観測が乏しい箇所で、どの priors（単眼の手がかり、空情報）と訓練戦略がジオメトリを改善するか。
RQ4dense LiDAR データなしで実世界の街路ビューデータセット上で、最先端のジオメトリと外観を達成できるか。
RQ5街路ビュー再構成における cuboid/hyper-cuboid hashing の実用的な訓練効率とスケーラビリティの利点は何か。

主な発見

StreetSurf は LiDAR データを必要とせず、単一の RTX3090 で 1–2 時間の訓練において街路ビューで最先端のジオメトリと外観品質を達成する。
近距離と遠距離ビューのモデルを分離し、路面初期化と組み合わせることで、非物体中心の街路ビューデータにおいて堅牢な最適化を実現する。
単眼の手がかりと任意の空マスクは、質感の乏しい領域や限られた視点下でのジオメトリに大きく寄与する。
立方体ベースの空間と cuboid/hash-grid 表現は、長く狭い街路シーンにおける PSNR 指標で立方体空間を上回る。
マルチステージの、占有グリッドに導かれたレイマーチング戦略は、難しい街路ビューの系列でサンプリング効率と細部の捕捉を向上させる。

(b) Street views: Unbounded scene captured with non-object-centric, long and narrow camera trajectories. We propose to use aligned cuboid bounds and cuboid model.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。