QUICK REVIEW

[论文解读] SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines

Yinda Xu, Zeyu Wang|arXiv (Cornell University)|Nov 14, 2019

Video Surveillance and Tracking Methods参考文献 29被引用 57

一句话总结

SiamFC++ 引入面向视觉跟踪的实用目标估计指南，构建了一个无锚点的逐像素Siamese跟踪器并具备质量分支，在五个基准上以高速度实现了最先进的性能。

ABSTRACT

Visual tracking problem demands to efficiently perform robust classification and accurate target state estimation over a given target at the same time. Former methods have proposed various ways of target state estimation, yet few of them took the particularity of the visual tracking problem itself into consideration. After a careful analysis, we propose a set of practical guidelines of target state estimation for high-performance generic object tracker design. Following these guidelines, we design our Fully Convolutional Siamese tracker++ (SiamFC++) by introducing both classification and target state estimation branch(G1), classification score without ambiguity(G2), tracking without prior knowledge(G3), and estimation quality score(G4). Extensive analysis and ablation studies demonstrate the effectiveness of our proposed guidelines. Without bells and whistles, our SiamFC++ tracker achieves state-of-the-art performance on five challenging benchmarks(OTB2015, VOT2018, LaSOT, GOT-10k, TrackingNet), which proves both the tracking and generalization ability of the tracker. Particularly, on the large-scale TrackingNet dataset, SiamFC++ achieves a previously unseen AUC score of 75.4 while running at over 90 FPS, which is far above the real-time requirement. Code and models are available at: https://github.com/MegviiDetection/video_analyst .

研究动机与目标

确定高性能跟踪器中目标状态估计的实用指南。
设计一个集成分类和精确目标状态估计且无需锚点的Siamese跟踪器。
融入一个估计质量分数以提升定位。
展示在多样化基准上的最先进性能和泛化能力。

提出的方法

开发一个全卷积的Siamese跟踪器，在互相关之后具有分类头和回归头。
通过在像素位置进行预测来移除基于锚点的匹配，从而实现无歧义评分且不需要目标大小分布的先验知识。
引入一个估计质量分支，输出先验空间分数（PSS）以在推理期间对边界框质量进行加权。
定义一个结合分类损失、质量损失和回归损失的训练目标（L = Lcls + lambda*Lquality + lambda*Lreg）。
使用一个惩罚式最终分数，将分类分数与质量分数相乘，以实现鲁棒的边界框选择。
评估两种骨干网络（AlexNet 和 GoogLeNet），并进行广泛的消融研究以证明设计选择。

实验结果

研究问题

RQ1将分类与目标状态估计分解是否能提升跟踪的鲁棒性和准确性？
RQ2对于评分和估计而言，锚点无关的逐像素预测是否优于锚点方法？
RQ3引入一个估计质量分数（PSS 或基于 IoU）是否提升定位精度？
RQ4所提出的指南是否在多样化的跟踪基准上实现最先进的结果并保持实时速度？

主要发现

SiamFC++ 在五个基准上实现最先进的结果：OTB2015、VOT2018、LaSOT、GOT-10k 和 TrackingNet。
在 TrackingNet 上，SiamFC++-GoogLeNet 的 AUC 达到 75.4，且以超过 90 FPS 运行。
移除锚点并使用逐像素预测能降低匹配歧义，提升鲁棒性和泛化能力，相较于如 SiamRPN++ 等锚点跟踪器。
引入估计质量分数（PSS）提升定位精度和鲁棒性，且在跨数据集的一致性下选择 PSS。
这两种骨干（AlexNet 和 GoogLeNet）提供强劲的性能-速度权衡，在 VOT2018 上的 EAO 具有竞争力（0.400），并且鲁棒性更优。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。