[论文解读] StreetSurf: Extending Multi-view Implicit Surface Reconstruction to Street Views
StreetSurf 将多视图隐式表面重建扩展到无界街景数据,无需 LiDAR,通过将近距离和远距离视图几何解耦,利用 cuboid hash-grids 与 monocular priors,在适度的计算量下实现了最先进的几何与外观。
We present a novel multi-view implicit surface reconstruction technique, termed StreetSurf, that is readily applicable to street view images in widely-used autonomous driving datasets, such as Waymo-perception sequences, without necessarily requiring LiDAR data. As neural rendering research expands rapidly, its integration into street views has started to draw interests. Existing approaches on street views either mainly focus on novel view synthesis with little exploration of the scene geometry, or rely heavily on dense LiDAR data when investigating reconstruction. Neither of them investigates multi-view implicit surface reconstruction, especially under settings without LiDAR data. Our method extends prior object-centric neural surface reconstruction techniques to address the unique challenges posed by the unbounded street views that are captured with non-object-centric, long and narrow camera trajectories. We delimit the unbounded space into three parts, close-range, distant-view and sky, with aligned cuboid boundaries, and adapt cuboid/hyper-cuboid hash-grids along with road-surface initialization scheme for finer and disentangled representation. To further address the geometric errors arising from textureless regions and insufficient viewing angles, we adopt geometric priors that are estimated using general purpose monocular models. Coupled with our implementation of efficient and fine-grained multi-stage ray marching strategy, we achieve state of the art reconstruction quality in both geometry and appearance within only one to two hours of training time with a single RTX3090 GPU for each street view sequence. Furthermore, we demonstrate that the reconstructed implicit surfaces have rich potential for various downstream tasks, including ray tracing and LiDAR simulation.
研究动机与目标
- 解决使用非对象中心、长而窄的轨迹所捕获的无界街景中三维几何重建的挑战。
- 将近距离几何与远距离视图及天空解耦,以有效分配模型容量。
- 将单目线索的几何先验和路面初始化整合到优化中以实现稳定。
- 通过多阶段、由占据网格引导的光线行进策略提升效率。
- 在真实的自动驾驶数据集(Waymo)上展示重建质量,而无需依赖密集的 LiDAR。
提出的方法
- 将场景分成三部分:近距离(cr)、远距离视图(dv)和天空,每部分由专门的网络建模(近距离使用 cuboid NeuS,dv 使用 hyper-cuboid NeRF++,天空使用方向性 MLP)。
- 使用对齐的立方体边界与 cuboid/hyper-cuboid 哈希网格来表示几何与外观,并在光线沿程中进行适当采样。
- 采用多壳光线行进策略,结合占据网格与多阶段分层采样,以提高效率和获取细节。
- 用路面先验初始化近距离 SDF,以促进正确的解耦并应用熵正则化以防止退化解。
- 在无 LiDAR 时利用单目先验(法线和深度)以及可选天空掩码,来指导几何与外观。
- 端到端训练,结合光度、几何、掩码、熵、eikonal 与稀疏性损失;如有需要可对相机位姿进行微调并使用逐帧外观嵌入。

实验结果
研究问题
- RQ1如何对街景数据进行分区和建模,以应对无界、非对象中心的相机轨迹?
- RQ2在无监督情况下,近距离与远距离视图几何能否解耦,初始化与先验对这一过程有何影响?
- RQ3哪些先验(单目线索、天空信息)与训练策略能够在纹理缺失或观测稀疏的区域改善几何?
- RQ4在真实街景数据集上,是否有可能在不依赖密集 LiDAR 的情况下实现最先进的几何与外观?
- RQ5使用 cuboid/hyper-cuboid 哈希对街景重建在实际训练效率与可扩展性方面有哪些具体优势?
主要发现
- StreetSurf 在街景上实现了最先进的几何与外观质量,在单个 RTX3090 上训练 1–2 小时且不需要 LiDAR 数据。
- 对近距离与远距离视图模型进行解耦,并以路面初始化为起点,在非对象中心的街景数据中实现鲁棒优化。
- 单目线索和可选天空掩码显著有助于纹理缺失区域和有限视角下的几何。
- 以 cuboid 为基础的空间及 cuboid/哈希网格表示,在长而窄的街景场景的 PSNR 指标上优于立方体空间。
- 多阶段、由占据网格引导的光线行进策略提升了对复杂街景序列的采样效率与细节捕捉能力。

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。