QUICK REVIEW

[论文解读] When the City Teaches the Car: Label-Free 3D Perception from Infrastructure

Zhen Xu, Jinsu Yoo|arXiv (Cornell University)|Mar 17, 2026

Advanced Neural Network Applications被引用 0

一句话总结

这篇论文提出了基础设施引导、无标注的三维感知，其中静态路侧单元（RSU）从未标注数据中学习，并提供伪标签在离线训练自车检测器，在CARLA CIVET上Vehicle的AP达到82.3%，低于监督上限94.4%。

ABSTRACT

Building robust 3D perception for self-driving still relies heavily on large-scale data collection and manual annotation, yet this paradigm becomes impractical as deployment expands across diverse cities and regions. Meanwhile, modern cities are increasingly instrumented with roadside units (RSUs), static sensors deployed along roads and at intersections to monitor traffic. This raises a natural question: can the city itself help train the vehicle? We propose infrastructure-taught, label-free 3D perception, a paradigm in which RSUs act as stationary, unsupervised teachers for ego vehicles. Leveraging their fixed viewpoints and repeated observations, RSUs learn local 3D detectors from unlabeled data and broadcast predictions to passing vehicles, which are aggregated as pseudo-label supervision for training a standalone ego detector. The resulting model requires no infrastructure or communication at test time. We instantiate this idea as a fully label-free three-stage pipeline and conduct a concept-and-feasibility study in a CARLA-based multi-agent environment. With CenterPoint, our pipeline achieves 82.3% AP for detecting vehicles, compared to a fully supervised ego upper bound of 94.4%. We further systematically analyze each stage, evaluate its scalability, and demonstrate complementarity with existing ego-centric label-free methods. Together, these results suggest that city infrastructure itself can potentially provide a scalable supervisory signal for autonomous vehicles, positioning infrastructure-taught learning as a promising orthogonal paradigm for reducing annotation cost in 3D perception.

研究动机与目标

通过利用固定的RSU作为无监督教师，降低跨城三维感知标注成本。
开发一个完全无标签的三阶段管道，让RSU从无标注数据中学习、广播预测作为伪标签，并离线训练自车检测器。
在仿真多城环境中，系统性研究可行性、可扩展性，以及与自我中心无标签方法的互补性。

提出的方法

阶段1：无监督RSU训练，每个RSU利用时间一致性和基于持久性的伪标签学习一个位置专属检测器。
阶段2：RSU向经过的自车广播预测；自车将这些预测聚合成伪标签，使用距离加权NMS和简单类别匹配。
阶段3：离线使用聚合的基础设施派生伪标签训练自车检测器，在测试时获得独立的自车模型。
评估基于BEV AP度量的CenterPoint和PointPillars检测器；分析通信噪声和伪标签细化的影响。
数据集CIVET在CARLA与V2XVerse基础上构建，包含4个城镇、每城镇12个RSU，以研究地理特异性监督和可扩展性。

Figure 1 : Can city infrastructure teach vehicles to perceive? We explore a new paradigm where roadside infrastructure acts as distributed teachers, providing supervision to train ego perception models without manual annotations.

实验结果

研究问题

RQ1静态RSU能否从未标注观测中学习出可靠的无标签检测器？
RQ2RSU生成的伪标签是否能训练出在测试时不依赖基础设施的有竞争力的自车检测器？
RQ3RSU数量、部署位置和通信噪声等因素如何影响下游自车性能？
RQ4基础设施生成的伪标签是否能与面向自我的无标签方法互补，并实现跨城泛化？

主要发现

完全无标签的管道在一个城镇内的车辆检测AP达到82.3%，接近监督自车上限94.4%。
在四个城镇中，使用聚合的RSU监督训练时的AP达到82.7%，上限为91.0%。
跟踪与无监督RSU训练提升了伪标签质量和自车性能；通信噪声会降低定位，特别是对行人。
辅助细化（框 refinment）在嘈杂条件下提升伪标签质量和自车AP。
将基础设施伪标签与面向自我的方法（如Oyster）结合可获得额外性能提升。
RSU检测器具有位置特定性，不能直接泛化到其他RSU视角，催生分布式教师集合的动机。

Figure 2 : Overview of infrastructure-taught, label-free 3D perception. Stage 1: each RSU learns a location-specialized detector in an unsupervised manner by exploiting temporal consistency from its stationary viewpoint. Stage 2: trained RSUs broadcast their predicted 3D bounding boxes to nearby ego

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。