QUICK REVIEW

[论文解读] EA-LSS: Edge-aware Lift-splat-shot Framework for 3D BEV Object Detection

Haotian Hu, Fanyi Wang|arXiv (Cornell University)|Mar 31, 2023

Advanced Neural Network Applications被引用 13

一句话总结

EA-LSS 引入边缘感知深度融合与细粒度深度监督，以提高基于 LSS 的 BEV 3D 检测中的深度估计，在 nuScenes 上实现最先进的结果，推理开销微乎其微。

ABSTRACT

In recent years, great progress has been made in the Lift-Splat-Shot-based (LSS-based) 3D object detection method. However, inaccurate depth estimation remains an important constraint to the accuracy of camera-only and multi-model 3D object detection models, especially in regions where the depth changes significantly (i.e., the "depth jump" problem). In this paper, we proposed a novel Edge-aware Lift-splat-shot (EA-LSS) framework. Specifically, edge-aware depth fusion (EADF) module is proposed to alleviate the "depth jump" problem and fine-grained depth (FGD) module to further enforce refined supervision on depth. Our EA-LSS framework is compatible for any LSS-based 3D object detection models, and effectively boosts their performances with negligible increment of inference time. Experiments on nuScenes benchmarks demonstrate that EA-LSS is effective in either camera-only or multi-model models. It is worth mentioning that EA-LSS achieved the state-of-the-art performance on nuScenes test benchmarks with mAP and NDS of 76.5% and 77.6%, respectively.

研究动机与目标

在 LSS 基 BEV 物体检测中激发深度跃迁问题及其对深度估计精度的影响。
提出边缘感知深度融合（EADF）以增强边缘处的深度引导。
引入细粒度深度（FGD）模块，在训练阶段提供详细的深度监督。
Develop a plug-and-play EA-LSS 框架，兼容现有基于 LSS 的 BEV 方法。
在 nuScenes 上演示改进的三维检测性能，且推理时延几乎为零。

提出的方法

提出一个可即插即用的 EA-LSS 框架，耦合边缘感知深度融合（EADF）模块和细粒度深度（FGD）模块。
EADF 计算多视角密集深度图和边缘图，然后将二者融合，生成用于深度估计的边缘感知监督。
FGD 增加上采样分支，并使用类似 focal 的损失来监督非零深度像素，保留深度分布的细节。
FGD 损失聚焦于非零地面真值深度像素，避免 zeros 主导监督。
EA-LSS 将 EADF 和 FGD 损失与标准检测损失（分类和框回归）一起并入总训练目标。
该框架可与多种基于 LSS 的 BEV 检测器兼容，并在 nuScenes 的相机与多模态设置上进行评估。

实验结果

研究问题

RQ1如何在深度快速变化区域（深度跃变）中改进基于 LSS 的 BEV 检测器的深度估计？
RQ2边缘感知的深度线索和细粒度深度监督能否降低深度错配并改善 BEV 特征？
RQ3将 EADF 和 FGD 融入对 nuScenes 上的相机只对和多模态 BEV 3D 检测器的影响？
RQ4在提高检测精度的同时，EA-LSS 是否保持微小的推理时延？

主要发现

Method	Modality	mAP	NDS	mATE	mASE	mAOE	mAVE	mAAE
BEVDet	C	42.2	48.2	0.529	0.236	0.396	0.979	0.152
BEVFormer	C	44.5	53.5	0.582	0.256	0.375	0.378	0.126
CenterPoint	L	60.3	67.3	0.262	0.239	0.361	0.288	0.136
TransFusion	C+L	68.9	71.6	0.259	0.243	0.359	0.288	0.127
CMT	C+L	70.4	73.0	0.299	0.241	0.323	0.240	0.112
DeepInteraction	C+L	70.8	73.4	0.257	0.240	0.325	0.245	0.128
BEVFusion	C+L	71.3	73.3	0.250	0.240	0.359	0.254	0.132
+EA-LSS	C+L	72.2	74.4	0.247	0.237	0.304	0.250	0.133
EA-LSS*	C+L	76.5	77.6	0.233	0.228	0.281	0.196	0.123

EA-LSS 在仅相机和多模态基线中均有改进；例如 Tig-bev 的 mAP 提升 2.1%、NDS 提升 3.2%；BEVFusion 的 mAP 提升 1.6%、NDS 提升 1.0%。
在 nuScenes 测试集上，使用测试时增强和模型集成的 EA-LSS 达到最先进的 mAP 76.5% 和 NDS 77.6%。
消融实验显示 FGD 和 EADF 各自对性能有贡献；单独的 FGD 产生较小的提升，而结合 EADF 则带来更大改进。
该框架在推理时延上几乎无额外开销，保持了实际效率。
EA-LSS 展现出对深度分布的有效细化与边缘聚焦的深度引导，缓解 BEV 预测中的深度跃变问题。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。