[论文解读] Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting
Gaussian-SLAM 引入了一个密集 RGBD SLAM 系统,它使用 3D 高斯斑点作为场景表示,以在交互速度实现照片级真实渲染,具备在线子地图管理和几何编码。它在真实世界数据上提供最先进的渲染质量和具有竞争力的重建。
We present a dense simultaneous localization and mapping (SLAM) method that uses 3D Gaussians as a scene representation. Our approach enables interactive-time reconstruction and photo-realistic rendering from real-world single-camera RGBD videos. To this end, we propose a novel effective strategy for seeding new Gaussians for newly explored areas and their effective online optimization that is independent of the scene size and thus scalable to larger scenes. This is achieved by organizing the scene into sub-maps which are independently optimized and do not need to be kept in memory. We further accomplish frame-to-model camera tracking by minimizing photometric and geometric losses between the input and rendered frames. The Gaussian representation allows for high-quality photo-realistic real-time rendering of real-world scenes. Evaluation on synthetic and real-world datasets demonstrates competitive or superior performance in mapping, tracking, and rendering compared to existing neural dense SLAM methods.
研究动机与目标
- 以高保真渲染为目标,动机是使用高斯点散布的场景表示来实现密集 SLAM。
- 将高斯点散布从离线多视扩展到在线的单目 RGBD SLAM。
- 在高斯斑点中对几何进行编码,以在单目设置中改进 3D 重建。
- 开发在线子地图播种和优化策略以保持交互式性能。
- 研究利用高斯场景表示的帧到模型跟踪,并与帧到帧跟踪进行比较。
提出的方法
- 用一组具有参数(均值、尺度、旋转、不透明度、球谐函数)的 3D 高斯来表示场景。
- 将输入序列拆分为子地图,以实现在线学习并防止灾难性遗忘;用深度和颜色损失来优化活动子地图。
- 从关键帧中的密集点云中播种新高斯,并将它们锚定在沿视线方向的表面后方以初始化几何。
- 使用可微分光栅化器进行渲染,颜色损失将 L1 与 SSIM 结合,深度损失为 L1,再加上正则化项以防止尺度爆炸。
- 通过用 RGBD 里程计初始化位姿并细化帧到模型再渲染损失来进行跟踪,同时指出高斯斑点在外推方面的局限性。
实验结果
研究问题
- RQ1Can Gaussian splats be effectively extended to encode geometry for online monocular SLAM with RGBD input?
- RQ2How can online sub-map seeding and optimization be designed to maintain interactive performance without catastrophic forgetting?
- RQ3What is the impact of frame-to-model tracking using Gaussian splats compared to frame-to-frame tracking in dense SLAM?
- RQ4What limits do Gaussian splats impose on geometry accuracy and extrapolation, and can these be mitigated in SLAM?
主要发现
| 方法 | PSNR↑ | SSIM↑ | LPIPS↓ |
|---|---|---|---|
| NICE-SLAM | 17.54 | 0.621 | 0.548 |
| Vox-Fusion | 18.17 | 0.673 | 0.504 |
| ESLAM | 15.29 | 0.658 | 0.488 |
| Point-SLAM | 19.82 | 0.751 | 0.514 |
| Gaussian-SLAM (ours) | 37.45 | 0.984 | 0.068 |
- Gaussian-SLAM achieves state-of-the-art rendering quality on ScanNet and comparable reconstruction performance to dense neural SLAM methods.
- On ScanNet, Gaussian-SLAM attains PSNR 37.45, SSIM 0.984, LPIPS 0.068, outperforming NICE-SLAM, Vox-Fusion, ESLAM, and Point-SLAM in rendering metrics.
- On TUM-RGBD, Gaussian-SLAM again shows strong rendering metrics with substantial improvements over competing methods (Table 2 results).
- Sub-map based online seeding and optimization enable interactive-time reconstruction while preserving geometry obtained from depth sensors.
- Tracking with Gaussian splats shows limitations in frame-to-model tracking due to extrapolation; oracle experiments indicate potential improvements with better depth rendering.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。