QUICK REVIEW

[論文レビュー] RC-GeoCP: Geometric Consensus for Radar-Camera Collaborative Perception

Xiaokai Bai, Lianqing Zheng|arXiv (Cornell University)|Feb 28, 2026

Adversarial Robustness in Machine Learning被引用数 0

ひとこと要約

RC-GeoCP は radar-アンカー型の幾何一貫性を用いた radar-カメラ協調認識を提案し、Geometric Structure Rectification、Uncertainty-Aware Communication、Consensus-Driven Assembler を用いて V2X-Radar および V2X-R のベンチマークで通信量を削減しつつ最先端の結果を達成します。

ABSTRACT

Collaborative perception (CP) enhances scene understanding through multi-agent information sharing. While LiDAR-centric systems offer precise geometry, high costs and performance degradation in adverse weather necessitate multi-modal alternatives. Despite dense visual semantics and robust spatial measurements, the synergy between cameras and 4D radar remains underexplored in collaborative settings. This work introduces RC-GeoCP, the first framework to explore the fusion of 4D radar and images in CP. To resolve misalignment caused by depth ambiguity and spatial dispersion across agents, RC-GeoCP establishes a radar-anchored geometric consensus. Specifically, Geometric Structure Rectification (GSR) aligns visual semantics with geometry derived from radar to generate spatially grounded, geometry-consistent representations. Uncertainty-Aware Communication (UAC) formulates selective transmission as a conditional entropy reduction process to prioritize informative features based on inter-agent disagreement. Finally, the Consensus-Driven Assembler (CDA) aggregates multi-agent information via shared geometric anchors to form a globally coherent representation. We establish the first unified radar-camera CP benchmark on V2X-Radar and V2X-R, demonstrating state-of-the-art performance with significantly reduced communication overhead. Code will be released soon.

研究の動機と目的

LiDAR 中心の設定を超えたロバストなマルチモーダル協調認識を動機づける。
レーダー由来の幾何を活用してカメラのセマンティクスを地につけ、深度による誤差を低減する。
帯域幅制約の下で情報利得を最大化するように、選択的で不確実性を考慮した通信を開発する。
共有レーダーアンカーを用いたグローバルに整合した融合を強制する、コンセンサス駆動のアセンブラを提案する。

提案手法

Geometric Structure Rectification (GSR) は sparsely-sampled radar cues に guided された deformable cross-attention によって camera BEV 特徴をレーダー幾何に整列させる。
Uncertainty-Aware Communication (UAC) は ego-中心の需要マップを計算し、帯域削減のため top-K の情報豊富なトークンを選択する。
Learnable agent-wise tokens は非選択特徴にクロスアテンションすることで残差コンテキストを保持し、情報喪失を緩和する。
Consensus-Driven Assembler (CDA) はレーダー由来の幾何的コンセンサスを注意ロジットへ注入し、エージェント間の物理的に grounded な融合を強制する。
Multi-scale fusion は rectified features を異なるスケールで送信されたトークンと共に集約し、協調的な BEV 表現を一貫して生成する。

実験結果

リサーチクエスチョン

RQ1レーダー由来の幾何的手掛かりは、深度の曖昧さと視点間の誤 alignment を緩和する安定なアンカーとなり得るか。
RQ2現実的な帯域幅制約の下で、不確実性を考慮した需要主導の通信は性能を改善するか。
RQ3レーダーベースの幾何コンセンサスは、マルチエージェントのトークン集約を改善し、グローバルに一貫した認識を生み出すか。
RQ4RC-GeoCP は統一された radar-camera CP ベンチマークで、既存の radar-only、camera-only、radar-camera ベースラインと比較してどの程度 performant か。

主な発見

RC-GeoCP は V2X-Radar の検証データで最先端の性能を達成：AP@0.5 = 44.55、AP@0.7 = 25.92。
V2X-Radar のテストで AP@0.5 = 42.61、AP@0.7 = 18.77。
V2X-R の検証データで val: AP@0.5 = 81.90、AP@0.7 = 65.09。
通信量が 2.39 単位のときでも競争力のある精度を示し、強力な効率向上を示唆する。
RC-GeoCP は媒体バックボーンを跨いで比較可能なレーダーカメラ融合法を一貫して上回り、中距離（30–50 m）で顕著な改善を示す。
時間的ミスアラインメント（非同期設定）下でも堅牢性を維持しつつ、 substantial な性能向上を達成する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。