QUICK REVIEW

[论文解读] Automated Quality Check of Sensor Data Annotations

Niklas Freund, Zekiye Ilknur-Öz|arXiv (Cornell University)|Feb 19, 2026

IoT and GPS-based Vehicle Safety Systems被引用 0

一句话总结

本论文提出一种用于多传感器铁路数据标注的自动化、基于规则的质检框架，提出九种检测器，并提供开源 RailLabel-providerkit，在 OSDaR23 上的高精度结果进行评估。

ABSTRACT

The monitoring of the route and track environment plays an important role in automated driving. For example, it can be used as an assistance system for route monitoring in automation level Grade of Automation (GoA) 2, where the train driver is still on board. In fully automated, driverless driving at automation level GoA4, these systems finally take over environment monitoring completely independently. With the help of artificial intelligence (AI), they react automatically to risks and dangerous events on the route. To train such AI algorithms, large amounts of training data are required, which must meet high-quality standards due to their safety relevance. In this publication we present an automatic method for assuring the quality of training data, significantly reducing the manual workload and accelerating the development of these systems. We propose an open-source tool designed to detect nine common errors found in multi-sensor datasets for railway vehicles. To evaluate the performance of the framework, all detected errors were manually validated. Six issue detection methods achieved 100% precision, while three additional methods reached precision rates 96% and 97%.

研究动机与目标

为安全关键的铁路感知系统提供高质量标注传感数据的必要性提供动力。
引入自动质量检查框架以减少人工标注验核工作量。
为多传感器铁路数据集中常见标注错误定义并实现九个检测器。
在 OSDaR23 数据集上评估检测器并通过人工复核验证结果。
提供开源工具，帮助研究人员和行业提升数据质量。

提出的方法

为标注错误开发九个检测器，其中五个铁路特有、四个领域通用。
将检测规则实现为 Python 算法。
独立于原始传感数据来处理标注文件（JSON），以评估质量。
通过人工复核验证自动检测的错误以衡量精度。
在 OSDaR23 数据集上测试框架并对结果进行人工验证。
将 RailLabel-providerkit 作为开源 Python 库发布，便于快速使用。

实验结果

研究问题

RQ1自动检测器是否能够以高精度识别多传感铁路数据集中的常见标注错误？
RQ2在与人工审查的真实标签对比时，各检测器的精度是多少？
RQ3在自动化检查后，OSDaR23 数据集中的标注错误的普遍性如何？
RQ4所提出的开源工具是否适用于研究人员和行业在数据管道中采用？

主要发现

六个检测器在其检测的错误上达到 100% 的精度。
三个检测器达到 96% 至 97% 的精度，误报率为 3–4%，经复核后被排除。
在 OSDaR23 的所有标注要素中，自动检查发现 2.18% 存在错误。
该框架发现了最终数据集中未被人工质量检查捕捉到的错误。
软件发布强调研究社区和行业的可重复性与可访问性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。