QUICK REVIEW

[论文解读] Joint Detection and Multi-Object Tracking with Graph Neural Networks

Yongxin Wang, Xinshuo Weng|arXiv (Cornell University)|Jun 23, 2020

Video Surveillance and Tracking Methods参考文献 91被引用 37

一句话总结

本文提出一种基于图神经网络（GNNs）的联合多目标跟踪（MOT）框架，通过建模对象之间的时空交互，同时优化目标检测与数据关联。通过在基于GNN的特征学习框架中整合外观和运动特征，该方法在MOT挑战数据集上实现了最先进性能，通过端到端可微训练超越了级联方法。

ABSTRACT

Object detection and data association are critical components in multi-object tracking (MOT) systems. Despite the fact that these two components are highly dependent on each other, one popular trend in MOT is to perform detection and data association as separate modules, processed in a cascaded order. Due to this cascaded process, the resulting MOT system can only perform forward inference and cannot back-propagate error through the entire pipeline and correct them. This leads to sub-optimal performance over the total pipeline. To address this issue, recent work jointly optimizes detection and data association and forms an integrated MOT approach, which has been shown to improve performance in both detection and tracking. In this work, we propose a new approach for joint MOT based on Graph Neural Networks (GNNs). The key idea of our approach is that GNNs can explicitly model complex interactions between multiple objects in both the spatial and temporal domains, which is essential for learning discriminative features for detection and data association. We also leverage the fact that motion features are useful for MOT when used together with appearance features. So our proposed joint MOT approach also incorporates appearance and motion features within our graph-based feature learning framework, leading to better feature learning for MOT. Through extensive experiments on the MOT challenge dataset, we show that our proposed method achieves state-of-the-art performance on both object detection and MOT.

研究动机与目标

解决级联检测与跟踪流水线性能次优的问题，这些流水线无法在模块间反向传播误差。
通过联合优化检测与数据关联系统，实现多目标跟踪的端到端训练。
利用图神经网络建模多个对象在时空维度上的复杂交互。
通过在统一的图基框架中整合外观和运动特征，提升特征学习能力。
通过联合优化实现在MOT挑战数据集上的最先进性能。

提出的方法

该方法采用图神经网络（GNNs）显式建模对象在空间和时间维度上的交互。
对象被表示为图中的节点，边编码检测之间的空间接近度和时间一致性。
外观和运动特征被嵌入图节点，并通过GNN的消息传递机制进行更新。
GNN框架通过允许误差在整个流水线中反向传播，实现检测与数据关联的联合优化。
模型通过结合检测与关联目标的可微损失函数，实现端到端训练。
该框架支持联合特征学习，其中外观和运动线索通过图卷积动态优化。

实验结果

研究问题

RQ1基于GNN的框架能否在多目标跟踪中联合优化检测与数据关联，从而提升整体性能？
RQ2对象之间的时空交互如何影响联合MOT中的特征学习？
RQ3在图基学习框架中整合外观与运动特征对跟踪精度有何影响？
RQ4通过检测与关联模块的端到端训练及误差反向传播，是否能实现优于级联流水线的性能？
RQ5所提出的基于GNN的方法是否在标准MOT基准上实现了最先进结果？

主要发现

所提出的基于GNN的联合MOT框架在MOT挑战数据集上实现了最先进性能。
在图框架中整合外观与运动特征，促进了更具判别性的特征学习。
通过检测与关联模块的端到端训练及误差反向传播，相比级联方法，显著提升了流水线优化效果。
通过GNN建模时空交互，提升了数据关联准确率与检测质量。
该方法在检测与多目标跟踪指标上均优于现有最先进方法。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。