QUICK REVIEW

[논문 리뷰] StreetSurf: Extending Multi-view Implicit Surface Reconstruction to Street Views

Jianfei Guo, Nianchen Deng|arXiv (Cornell University)|2023. 06. 08.

Computer Graphics and Visualization Techniques인용 수 21

한 줄 요약

StreetSurf는 LiDAR 없이 다중 뷰 암시적 표면 재구성을 비제한된 스트리트 뷰 데이터로 확장하고, 근거리와 원거리 뷰 기하를 큐보이드 해시 그리드와 단안 priors를 사용해 분리함으로써, modest compute로 최첨단 기하 및 외관을 달성한다.

ABSTRACT

We present a novel multi-view implicit surface reconstruction technique, termed StreetSurf, that is readily applicable to street view images in widely-used autonomous driving datasets, such as Waymo-perception sequences, without necessarily requiring LiDAR data. As neural rendering research expands rapidly, its integration into street views has started to draw interests. Existing approaches on street views either mainly focus on novel view synthesis with little exploration of the scene geometry, or rely heavily on dense LiDAR data when investigating reconstruction. Neither of them investigates multi-view implicit surface reconstruction, especially under settings without LiDAR data. Our method extends prior object-centric neural surface reconstruction techniques to address the unique challenges posed by the unbounded street views that are captured with non-object-centric, long and narrow camera trajectories. We delimit the unbounded space into three parts, close-range, distant-view and sky, with aligned cuboid boundaries, and adapt cuboid/hyper-cuboid hash-grids along with road-surface initialization scheme for finer and disentangled representation. To further address the geometric errors arising from textureless regions and insufficient viewing angles, we adopt geometric priors that are estimated using general purpose monocular models. Coupled with our implementation of efficient and fine-grained multi-stage ray marching strategy, we achieve state of the art reconstruction quality in both geometry and appearance within only one to two hours of training time with a single RTX3090 GPU for each street view sequence. Furthermore, we demonstrate that the reconstructed implicit surfaces have rich potential for various downstream tasks, including ray tracing and LiDAR simulation.

연구 동기 및 목표

Address the challenge of reconstructing 3D geometry from unbounded street views captured with non-object-centric, long, narrow trajectories.
Disentangle close-range geometry from distant-view and sky to allocate model capacity effectively.
Incorporate geometric priors from monocular cues and road-surface initialization to stabilize optimization.
Improve efficiency with a multi-stage, occupancy-guided ray marching strategy.
Demonstrate reconstruction quality on real-world autonomous-driving datasets (Waymo) without relying on dense LiDAR.

제안 방법

Divide the scene into three parts: close-range (cr), distant-view (dv), and sky, each modeled by specialized networks (cuboid NeuS for cr, hyper-cuboid NeRF++ for dv, directional MLP for sky).
Use aligned cuboid boundaries and cuboid/hyper-cuboid hash-grids to represent geometry and appearance with appropriate sampling along rays.
Employ a multi-shell ray marching strategy that combines occupancy grids with multi-stage hierarchical sampling for efficiency and fine detail.
Initialize the close-range SDF with road-surface priors to encourage correct disentanglement and apply entropy regularization to prevent degenerate solutions.
Leverage monocular priors (normals and depths) when LiDAR is unavailable, plus optional sky masks, to guide geometry and appearance.
Train end-to-end with photometric, geometry, mask, entropy, eikonal, and sparsity losses; optionally refine camera poses and use per-frame appearance embeddings.

(a) NeRF-360: Unbounded scene with object-centric camera trajectories. Using spherical bounds and spherical/cubic model.

실험 결과

연구 질문

RQ1How can street-view data be partitioned and modeled to handle unbounded, non-object-centric camera trajectories?
RQ2Can close-range and distant-view geometries be disentangled without supervision, and how do initialization and priors impact this?
RQ3What priors (monocular cues, sky information) and training strategies improve geometry in textureless or sparsely observed regions?
RQ4Is it possible to achieve state-of-the-art geometry and appearance without dense LiDAR data on real-world street-view datasets?
RQ5What are the practical training efficiency and scalability benefits of cuboid/hyper-cuboid hashing for street-view reconstruction?

주요 결과

StreetSurf achieves state-of-the-art geometry and appearance quality on street views with 1–2 hours of training on a single RTX3090, without requiring LiDAR data.
Disentangling close-range and distant-view models, with road-surface initialization, yields robust optimization in non-object-centric street-view data.
Monocular cues and optional sky masks significantly aid geometry in textureless regions and under limited viewpoints.
A cuboid-based space with cuboid/hash-grid representations outperforms cubic spaces for long, narrow street scenes in PSNR metrics.
A multi-stage, occupancy-guided ray marching strategy improves sampling efficiency and detail capture in challenging street-view sequences.

(b) Street views: Unbounded scene captured with non-object-centric, long and narrow camera trajectories. We propose to use aligned cuboid bounds and cuboid model.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.