QUICK REVIEW

[论文解读] Chained-Tracker: Chaining Paired Attentive Regression Results for End-to-End Joint Multiple-Object Detection and Tracking

Jinlong Peng, Changan Wang|arXiv (Cornell University)|Jul 29, 2020

Video Surveillance and Tracking Methods参考文献 40被引用 32

一句话总结

本文提出 Chained-Tracker (CTracker)，一个在线端到端的 MOT 模型，可以回归相邻帧的成对边界框并将它们串联成轨迹，在 MOT16/MOT17 上无需额外训练数据即可达到最先进的结果。

ABSTRACT

Existing Multiple-Object Tracking (MOT) methods either follow the tracking-by-detection paradigm to conduct object detection, feature extraction and data association separately, or have two of the three subtasks integrated to form a partially end-to-end solution. Going beyond these sub-optimal frameworks, we propose a simple online model named Chained-Tracker (CTracker), which naturally integrates all the three subtasks into an end-to-end solution (the first as far as we know). It chains paired bounding boxes regression results estimated from overlapping nodes, of which each node covers two adjacent frames. The paired regression is made attentive by object-attention (brought by a detection module) and identity-attention (ensured by an ID verification module). The two major novelties: chained structure and paired attentive regression, make CTracker simple, fast and effective, setting new MOTA records on MOT16 and MOT17 challenge datasets (67.6 and 66.6, respectively), without relying on any extra training data. The source code of CTracker can be found at: github.com/pjl1995/CTracker.

研究动机与目标

激发并解决以检测为基础的跟踪和部分端到端 MOT 方法中的次优问题。
提出一个完全端到端的模型，联合执行检测、特征提取和数据关联。
引入成对注意力回归和链式结构，将跨帧关联转换为成对检测问题。
在 MOT16 和 MOT17 上展示最先进的 MOT 性能，无需额外训练数据。

提出的方法

将相邻帧对（链节点）作为输入，并回归表示两帧中同一目标的成对边界框。
利用带有对象注意力和身份注意力的联合注意力模块来引导成对边界框回归。
引入 Chained-Anchors，在单次回归中预测相邻帧的两个边界框。
通过基于 IoU 的匹配和 Kuhn-Munkres 算法将相邻节点串联起来，形成较长的轨迹。
引入内存共享机制（MSM）以在节点之间复用帧特征并加速推理。
使用多任务损失进行训练，该损失包含回归、分类和 ID 验证项，并使用 focal 损失。

实验结果

研究问题

RQ1一个端到端的 MOT 模型，能够联合优化检测、特征提取和数据关联，是否能超越传统的跟踪-通过检测和部分端到端方法？
RQ2配对注意力（对象注意力和身份注意力）是否在在线 MOT 设置中提升回归精度和数据关联？
RQ3一种链式、相邻帧的回归方法是否能将跨帧关联转化为鲁棒的成对检测问题？
RQ4在线 MOT 推断中内存共享的效率与准确性权衡是什么？

主要发现

CTracker 在 MOT16（67.6）和 MOT17（66.6）上实现了最先进的 MOTA，无需额外训练数据。
消融实验表明对象注意力和联合注意力（包括 ID 验证）显著提高 MOTA 和 IDF1。
完整的 Joint-Attention 版本（CTracker）显著提升 IDF1，反映出更好的数据关联，同时 MOTP 略有下降。
内存共享机制降低了计算量，使 1080p 输入下的跟踪达到约 34.4 FPS。
与 MOT17 上的在线 MOT 基线相比，CTracker 达到 66.6 MOTA 和 57.4 IDF1，MOTP 具有竞争力。
链式策略通过跨相邻帧对的基于 IoU 的匹配和 KM 指派有效地形成长轨迹。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。