QUICK REVIEW

[论文解读] Target-Aware Deep Tracking

Xin Li, Chao Ma|arXiv (Cornell University)|Apr 3, 2019

Video Surveillance and Tracking Methods参考文献 50被引用 31

一句话总结

该论文通过利用基于梯度的滤波器选择方法，从预训练的卷积神经网络中提取目标感知的深度特征，结合回归损失和排序损失，提出了一种用于视觉追踪的目标感知深度特征。该方法识别出针对目标的判别性、尺度敏感的滤波器，显著提升了在OTB-2015、VOT-2015和Temple Color-128数据集上最先进的追踪精度与速度。

ABSTRACT

Existing deep trackers mainly use convolutional neural networks pre-trained for generic object recognition task for representations. Despite demonstrated successes for numerous vision tasks, the contributions of using pre-trained deep features for visual tracking are not as significant as that for object recognition. The key issue is that in visual tracking the targets of interest can be arbitrary object class with arbitrary forms. As such, pre-trained deep features are less effective in modeling these targets of arbitrary forms for distinguishing them from the background. In this paper, we propose a novel scheme to learn target-aware features, which can better recognize the targets undergoing significant appearance variations than pre-trained deep features. To this end, we develop a regression loss and a ranking loss to guide the generation of target-active and scale-sensitive features. We identify the importance of each convolutional filter according to the back-propagated gradients and select the target-aware features based on activations for representing the targets. The target-aware features are integrated with a Siamese matching network for visual tracking. Extensive experimental results show that the proposed algorithm performs favorably against the state-of-the-art methods in terms of accuracy and speed.

研究动机与目标

解决预训练深度特征在视觉追踪中因目标形态任意且缺乏目标感知能力而导致的局限性。
克服通用特征在区分目标与复杂背景及干扰物时判别能力差的问题。
通过仅选择最相关的卷积滤波器来降低计算成本，实现高效追踪。
通过有针对性的特征学习提升对外观变化和尺度变化的追踪鲁棒性。
开发一种轻量化、实时的追踪器，其在精度和推理速度上均优于现有最先进方法。

提出的方法

利用分类头反向传播的梯度，识别每个卷积滤波器对目标对象的重要性。
应用铰链回归损失，将预训练特征与由高斯函数导出的软标签对齐，以促进目标激活。
引入成对排序损失，通过比较目标样本与负样本之间的特征距离，学习尺度敏感的特征。
基于两种损失的梯度幅度过滤器重要性，选取关键滤波器，形成紧凑且目标感知的特征表示。
将所选的目标感知特征集成到Siamese追踪框架中，实现端到端的追踪推理。
利用t-SNE可视化验证目标感知特征相比预训练特征在类间与类内分离性上的显著提升。

实验结果

研究问题

RQ1基于梯度的预训练CNN滤波器选择能否提升视觉追踪中特征的判别能力？
RQ2结合回归损失与排序损失是否能带来优于单独使用任一损失的目标感知特征学习效果？
RQ3目标感知特征是否能在保持或提升追踪精度的同时降低计算成本？
RQ4目标感知特征在现实世界追踪场景中应对外观变化与尺度变化时的有效性如何？
RQ5与标准预训练特征相比，目标感知特征在多样化基准数据集上的性能提升程度如何？

主要发现

所提出的追踪器在OTB-2015数据集上达到0.660的AUC分数，精度与速度均优于所有现有最先进方法。
在Temple Color-128数据集上，该追踪器达到0.562的AUC，是无需在线适应的实时追踪器中的最佳表现。
消融实验表明，仅使用回归损失即可使AUC分别在Conv4-1上提升+4.3%、在Conv4-3上提升+4.9%（相比随机滤波器选择）。
结合回归与排序损失在OTB-2013上带来+1.8%的AUC增益，在OTB-2015上带来+1.6%的增益，证明了两种损失的互补优势。
t-SNE可视化验证，目标感知特征显著提升了类间与类内分离性。
该追踪器运行速度达33.7 FPS，在多个基准数据集上实现了高精度与实时性能的平衡。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。