QUICK REVIEW

[论文解读] RGB and LiDAR fusion based 3D Semantic Segmentation for Autonomous Driving

Khaled El Madawy, Hazem Rashed|arXiv (Cornell University)|Jun 1, 2019

Advanced Neural Network Applications参考文献 16被引用 23

一句话总结

本文提出在自动驾驶的3D语义分割中，通过极坐标网格映射表示，实现RGB图像与LiDAR点云的早期和中期融合。该方法在KITTI数据集上，使用SqueezeSeg和PointSeg架构，相较于仅使用LiDAR的基线模型，mIoU相对提升了10%。

ABSTRACT

LiDAR has become a standard sensor for autonomous driving applications as they provide highly precise 3D point clouds. LiDAR is also robust for low-light scenarios at night-time or due to shadows where the performance of cameras is degraded. LiDAR perception is gradually becoming mature for algorithms including object detection and SLAM. However, semantic segmentation algorithm remains to be relatively less explored. Motivated by the fact that semantic segmentation is a mature algorithm on image data, we explore sensor fusion based 3D segmentation. Our main contribution is to convert the RGB image to a polar-grid mapping representation used for LiDAR and design early and mid-level fusion architectures. Additionally, we design a hybrid fusion architecture that combines both fusion algorithms. We evaluate our algorithm on KITTI dataset which provides segmentation annotation for cars, pedestrians and cyclists. We evaluate two state-of-the-art architectures namely SqueezeSeg and PointSeg and improve the mIoU score by 10 % in both cases relative to the LiDAR only baseline.

研究动机与目标

通过融合互补的RGB与LiDAR数据，提升自动驾驶中3D语义分割的性能。
通过引入摄像头提供的丰富色彩信息，解决仅使用LiDAR进行语义分割时性能有限的问题。
系统性地评估在3D分割中传感器融合的早期与中期融合策略。
设计一种极坐标网格表示，以实现在特征层面有效融合RGB与LiDAR数据。
在显著提升KITTI基准测试分割精度的同时，实现推理的实时性。

提出的方法

将RGB图像转换为极坐标网格映射表示，以与LiDAR点云的几何结构对齐。
在基于CNN的架构中，通过在特征提取前拼接原始RGB与LiDAR数据，实现早期融合。
通过分别从RGB和LiDAR分支提取特征，再将特征拼接用于分割，实现中期融合。
设计一种混合融合策略，结合早期与中期融合，以发挥互补优势。
针对所提出的融合框架，适配两种最先进网络——SqueezeSeg与PointSeg，用于3D语义分割。
在KITTI数据集上使用标准划分和评估指标（包括平均交并比mIoU）进行模型训练与评估。

实验结果

研究问题

RQ1与仅使用LiDAR的方法相比，融合RGB与LiDAR数据在3D语义分割性能上提升效果如何？
RQ2在3D语义分割背景下，早期融合与中期融合策略的相对有效性如何？
RQ3极坐标网格表示能否有效对齐RGB与LiDAR数据，以支持联合特征学习？
RQ4类别不平衡与小实例尺寸对分割性能有何影响？融合是否能缓解这些问题？
RQ5所提出的融合架构在多大程度上保持了自动驾驶应用所需的实时推理速度？

主要发现

所提出的RGB-LiDAR融合方法在SqueezeSeg与PointSeg架构上，相较于仅使用LiDAR的基线模型，mIoU相对提升了10%。
对于SqueezeSeg，XYZDI+DIRGB混合融合方法实现了37.4%的mIoU，相较于仅使用LiDAR的基线（33.7%）绝对提升了3.7%。
对于PointSeg，早期融合实现了37.8%的mIoU，相较于仅使用LiDAR的基线（34.8%）提升了3%；中期融合达到了37.6%的mIoU。
行人与自行车类别的分割性能显著提升：在PointSeg中，早期融合与中期融合分别带来3.3%与5.8%的mIoU提升。
该方法实现了每帧约10 ms的实时推理，中期融合相较于无融合基线仅增加约3 ms的延迟。
定性结果表明，对汽车、行人和自行车的分割精度得到改善，尤其在遮挡或小目标等挑战性情况下表现更优。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。