QUICK REVIEW

[论文解读] SF3D-RGB: Scene Flow Estimation from Monocular Camera and Sparse LiDAR

Rajai Alhimdiat, Ramy Battrawy|arXiv (Cornell University)|Feb 25, 2026

Advanced Vision and Imaging被引用 0

一句话总结

SF3D-RGB 通过端到端将单目 RGB 特征与稀疏 LiDAR 点云融合，利用图匹配的最优传输与 refinement 模块估计稀疏场景流。

ABSTRACT

Scene flow estimation is an extremely important task in computer vision to support the perception of dynamic changes in the scene. For robust scene flow, learning-based approaches have recently achieved impressive results using either image-based or LiDAR-based modalities. However, these methods have tended to focus on the use of a single modality. To tackle these problems, we present a deep learning architecture, SF3D-RGB, that enables sparse scene flow estimation using 2D monocular images and 3D point clouds (e.g., acquired by LiDAR) as inputs. Our architecture is an end-to-end model that first encodes information from each modality into features and fuses them together. Then, the fused features enhance a graph matching module for better and more robust mapping matrix computation to generate an initial scene flow. Finally, a residual scene flow module further refines the initial scene flow. Our model is designed to strike a balance between accuracy and efficiency. Furthermore, experiments show that our proposed method outperforms single-modality methods and achieves better scene flow accuracy on real-world datasets while using fewer parameters compared to other state-of-the-art methods with fusion.

研究动机与目标

在使用多模态（RGB 与 LiDAR）时，推动鲁棒的场景流估计。
提出一个轻量级架构，将单目 RGB 特征与稀疏 LiDAR 点云特征融合用于稀疏场景流。
利用基于图匹配（最优传输）的模块从融合特征中计算初始流。
用一个残差细化模块对初始流进行细化，以提高精度。

提出的方法

通过特征金字塔网络（FPN）从连续的 RGB 帧提取多尺度 RGB 特征。
使用图卷积层在原始点云上提取每个点的 LiDAR 特征。
通过晚融合步骤将最粗 RGB 特征与 LiDAR 特征融合，形成逐点的融合表示。
使用基于最优传输的图匹配模块计算初始场景流，成本基于余弦相似度，采用基于 KL 散度的匿错（遮挡感知）质量放松。
用一个残差细化网络对初始流进行修正，利用学习到的相关性改善流场。

实验结果

研究问题

RQ1RGB 特征在特征层面融合时，是否能提升稀疏 LiDAR 基于场景流的准确性？
RQ2基于 Sinkhorn 的最优传输图匹配是否为稀疏点云提供稳健的对应关系？
RQ3在稀疏场景流任务中，RGB-LiDAR 的晚融合与早融合在精度与效率上有何差异？
RQ4对运输优化中的熵正则化与 KL 放松对处理遮挡的影响是什么？

主要发现

SF3D-RGB 在 FT3D 和基于 KITTI 的真实数据集上，精度高于仅 LiDAR 的基线。
RGB–LiDAR 融合在 EPE3D 与 EPE2D 指标上优于早融合和仅 LiDAR 的方法。
与密集三维场景流方法在标准 GPU 上相比，该模型参数更少且运行时具有竞争力。
对稀疏点云（2048 点）采用基于 Sinkhorn 的图匹配的一键式融合，在精度-效率上取得强烈折衷。
在 KITTI 派生数据集的微调进一步提升了相对于仅 LiDAR 的基线的性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。