QUICK REVIEW

[论文解读] Learning regression and verification networks for long-term visual tracking

Yunhua Zhang, Dong Wang|arXiv (Cornell University)|Sep 12, 2018

Video Surveillance and Tracking Methods参考文献 5被引用 74

一句话总结

引入一个长期跟踪框架，将离线回归网络与在线验证网络结合起来，执行局部搜索、存在/不存在决策和全图再检测；在 VOT2018 LTB35 和 OxUvA 长期基准上达到最先进水平。

ABSTRACT

Compared with short-term tracking, the long-term tracking task requires determining the tracked object is present or absent, and then estimating the accurate bounding box if present or conducting image-wide re-detection if absent. Until now, few attempts have been done although this task is much closer to designing practical tracking systems. In this work, we propose a novel long-term tracking framework based on deep regression and verification networks. The offline-trained regression model is designed using the object-aware feature fusion and region proposal networks to generate a series of candidates and estimate their similarity scores effectively. The verification network evaluates these candidates to output the optimal one as the tracked object with its classification score, which is online updated to adapt to the appearance variations based on newly reliable observations. The similarity and classification scores are combined to obtain a final confidence value, based on which our tracker can determine the absence of the target accurately and conduct image-wide re-detection to capture the target successfully when it reappears. Extensive experiments show that our tracker achieves the best performance on the VOT2018 long-term challenge and state-of-the-art results on the OxUvA long-term dataset.

研究动机与目标

解决长期追踪中目标在较长序列内出现、消失、再出现的空白问题。
开发一个离线训练的回归网络，用于生成带相似度分数的候选边界框。
结合一个在线更新的验证网络，在候选中区分真实目标。
实现基于置信度的在局部搜索与全图再检测之间的切换。
在 VOT2018 LTB35 与 OxUvA 长期数据集上展示出色的性能。

提出的方法

使用一个离线训练的回归网络（R），结合目标感知特征融合与区域建议网络来生成并对候选边界框进行评分。
将搜索区域特征与模板特征融合，生成用于边界框回归和相似度评分的 RPN 输入。
引入一个在线更新的验证网络（V），将候选者分类为前景/背景并 refine 最终跟踪。
通过将回归分数与验证分数结合，计算逐帧最终置信度，以决定存在/缺失并在需要时触发再检测。
基于置信度分数在局部搜索和全图再检测之间动态切换。
离线训练 R，使用类似 SSD 的损失，结合匹配（交叉熵）与定位（平滑 L1）损失；在线训练 V，使用 MDNet 风格微调。

实验结果

研究问题

RQ1如何将回归网络和验证网络整合，以在长期追踪中处理存在/缺失的决策？
RQ2离线回归模型能否在在线验证模型适应外观变化的同时稳健地产生候选？
RQ3基于置信度的在局部搜索与全局再检测之间切换是否能提升长期追踪性能？
RQ4目标感知特征融合对候选提案与回归精度有何影响？
RQ5在标准长期基准（VOT2018 LTB35、OxUvA）上的表现如何？

主要发现

在已评估的跟踪器中，在 VOT-2018 LTB35 上实现最佳 F-score、精度和召回（F-score 0.610，Pr 0.634，Re 0.588）。
在 VOT-2018 LTB35 上，在所给表格中对出现 1 帧的序列报告 100% 再检测成功。
在 OxUvA 长期数据集（开放挑战）上，达到最高 MaxGM 分数 0.544，TPR 0.609，TNR 0.485。
消融研究表明，加入验证显著提升长期性能，相较仅使用回归；特征融合中的拼接和相乘都具有益处。
对于特征提取器的 Siamese 配置相较于分离的在线/离线分支会降级性能，表明需要单独的输入处理。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。