QUICK REVIEW

[论文解读] RC-GeoCP: Geometric Consensus for Radar-Camera Collaborative Perception

Xiaokai Bai, Lianqing Zheng|arXiv (Cornell University)|Feb 28, 2026

Adversarial Robustness in Machine Learning被引用 0

一句话总结

RC-GeoCP 通过雷达锚定的几何一致性实现雷达-摄像头协作感知，使用 Geometric Structure Rectification、Uncertainty-Aware Communication、Consensus-Driven Assembler，在 V2X-Radar 与 V2X-R 基准上达到最先进成果且减少通信量。

ABSTRACT

Collaborative perception (CP) enhances scene understanding through multi-agent information sharing. While LiDAR-centric systems offer precise geometry, high costs and performance degradation in adverse weather necessitate multi-modal alternatives. Despite dense visual semantics and robust spatial measurements, the synergy between cameras and 4D radar remains underexplored in collaborative settings. This work introduces RC-GeoCP, the first framework to explore the fusion of 4D radar and images in CP. To resolve misalignment caused by depth ambiguity and spatial dispersion across agents, RC-GeoCP establishes a radar-anchored geometric consensus. Specifically, Geometric Structure Rectification (GSR) aligns visual semantics with geometry derived from radar to generate spatially grounded, geometry-consistent representations. Uncertainty-Aware Communication (UAC) formulates selective transmission as a conditional entropy reduction process to prioritize informative features based on inter-agent disagreement. Finally, the Consensus-Driven Assembler (CDA) aggregates multi-agent information via shared geometric anchors to form a globally coherent representation. We establish the first unified radar-camera CP benchmark on V2X-Radar and V2X-R, demonstrating state-of-the-art performance with significantly reduced communication overhead. Code will be released soon.

研究动机与目标

推动超越以 LiDAR 为中心的多模态协作感知的鲁棒性。
利用雷达推导的几何来固定摄像机语义并减少深度引起的对齐误差。
在带宽受限的情况下开发选择性、基于不确定性的通信以最大化信息增益。
提出使用共享雷达锚点的共识驱动聚合机制，以实现全局一致的融合。

提出的方法

Geometric Structure Rectification (GSR) 通过稀疏雷达线索引导的可变形跨注意力，将摄像机 BEV 特征对齐到雷达几何。
Uncertainty-Aware Communication (UAC) 计算自我中心需求图并选择前-K 个信息量最大的令牌以降低带宽使用。
Learnable agent-wise tokens 通过对未选中特征进行跨注意力以保留残留上下文，缓解信息损失。
Consensus-Driven Assembler (CDA) 将雷达推导的几何共识注入注意力 logits，以强制跨智能体的物理上可靠融合。
Multi-scale fusion 汇聚经过整正的特征与在不同尺度传输的令牌，以生成连贯的协作 BEV 表征。

实验结果

研究问题

RQ1雷达推导的几何线索是否可以作为稳定锚点来缓解雷达-摄像头 CP 中的深度模糊与跨视角错位？
RQ2在现实带宽约束下，基于不确定性的需求驱动通信是否提升性能？
RQ3基于雷达的几何共识是否能改善多智能体令牌聚合以实现全球一致的感知？
RQ4与现有的雷达-仅、摄像头-仅以及雷达-摄像头基线相比，RC-GeoCP 在统一的雷达-摄像头 CP 基准上的表现如何？

主要发现

RC-GeoCP 在 V2X-Radar 验证集上达到最先进性能：AP@0.5 = 44.55, AP@0.7 = 25.92。
在 V2X-Radar 测试集上，AP@0.5 = 42.61, AP@0.7 = 18.77。
在 V2X-R 上，验证集：AP@0.5 = 81.90, AP@0.7 = 65.09。
在使用 2.39 单位通信量的情况下，该方法仍然具有竞争力的准确性，显示出显著的效率提升。
RC-GeoCP 在多种骨架上持续超越可比的雷达-摄像头融合方法，尤其在中距离（30–50 m）有显著提升。
该框架在时序错位（异步设置）下仍保持鲁棒性，并带来 substantial 的性能提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。