QUICK REVIEW

[论文解读] End-to-End Tracking and Semantic Segmentation Using Recurrent Neural Networks

Peter Ondrúška, Julie Dequaire|arXiv (Cornell University)|Apr 18, 2016

Anomaly Detection Techniques and Applications参考文献 24被引用 46

一句话总结

本文提出了一种端到端的循环神经网络框架，通过利用原始激光数据实现实时联合目标跟踪与语义分割，借助无监督深度跟踪进行表征学习，并通过归纳迁移实现仅用极少标注数据即可对物体进行分类。该方法在真实世界道路交叉口数据上，优于当前最先进的无模型跟踪与少样本分类基线方法。

ABSTRACT

In this work we present a novel end-to-end framework for tracking and classifying a robot's surroundings in complex, dynamic and only partially observable real-world environments. The approach deploys a recurrent neural network to filter an input stream of raw laser measurements in order to directly infer object locations, along with their identity in both visible and occluded areas. To achieve this we first train the network using unsupervised Deep Tracking, a recently proposed theoretical framework for end-to-end space occupancy prediction. We show that by learning to track on a large amount of unsupervised data, the network creates a rich internal representation of its environment which we in turn exploit through the principle of inductive transfer of knowledge to perform the task of it's semantic classification. As a result, we show that only a small amount of labelled data suffices to steer the network towards mastering this additional task. Furthermore we propose a novel recurrent neural network architecture specifically tailored to tracking and semantic classification in real-world robotics applications. We demonstrate the tracking and classification performance of the method on real-world data collected at a busy road junction. Our evaluation shows that the proposed end-to-end framework compares favourably to a state-of-the-art, model-free tracking solution and that it outperforms a conventional one-shot training scheme for semantic classification.

研究动机与目标

为解决在传感器遮挡限制感知能力的复杂、动态且部分可观测的真实世界环境中维持准确情境意识的挑战。
通过实现从原始传感器输入的端到端学习，减少多阶段感知流水线中对手工设计组件的依赖。
通过统一的循环神经网络架构，实现同时进行目标跟踪与语义分类。
通过利用无监督跟踪预训练的归纳迁移，最小化对大规模标注数据集的依赖。
在真实场景中通过完整遮挡实现目标跟踪与分类的实时性能与鲁棒性。

提出的方法

该框架采用自定义的循环神经网络架构，结合多尺度卷积层，以处理激光数据中不同尺寸的物体。
通过动态记忆保留时间信息以实现长期跟踪，通过静态记忆存储特定位置的环境知识。
首先使用无监督深度跟踪对网络进行预训练，以从原始激光序列预测未来的占据网格。
通过将学习到的隐藏表征 $ h_t $ 作为分类头的输入，应用归纳迁移，从而减少数据需求。
通过在隐藏状态 $ h_t $ 上训练分类器实现语义分类，该状态捕获了来自跟踪过程的丰富空间与上下文特征。
系统以8Hz处理激光数据，GPU上推理耗时15ms，实现实时运行。

实验结果

研究问题

RQ1单一端到端深度学习框架是否能够实现实时从原始激光输入中联合完成动态、遮挡环境下的目标跟踪与语义分割？
RQ2在跟踪任务上进行无监督预训练在多大程度上能提升仅用极少标注数据的下游语义分类性能？
RQ3所提出的结合动态与静态记忆的循环架构在长期遮挡中保持目标状态准确性的有效性如何？
RQ4将隐藏表征 $ h_t $ 用作语义描述符是否优于直接从原始传感器输入进行分类？
RQ5与传统多阶段及当前最先进的无模型跟踪流水线相比，该端到端框架在准确率与鲁棒性方面表现如何？

主要发现

所提方法在预测未来占据网格方面优于原始深度跟踪架构及当前最先进的多阶段流水线，在10帧时间窗口内F1得分更高。
当使用隐藏表征 $ h_t $ 进行语义分类时，网络的负对数似然为49.129，而直接从原始输入 $ x_t $ 分类时为101.967，证明了归纳迁移的强大能力。
即使无输入，网络也能基于静态记忆生成合理的占据预测，证实其能够学习并保留特定位置的环境先验知识。
系统在完整遮挡期间仍能保持对物体的准确跟踪与分类，包括对短期未来物体运动的预测。
前向推理在Nvidia Titan GPU上每帧耗时15ms，实现实时运行，处理速率为8Hz，适用于真实激光数据流。
混淆矩阵显示分类准确率高，主要错误来源是行人与自行车因2D激光形状相似而相互误分类。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。