QUICK REVIEW

[论文解读] VITAL: VIsual Tracking via Adversarial Learning

Yibing Song, Chao Ma|arXiv (Cornell University)|Apr 12, 2018

Video Surveillance and Tracking Methods参考文献 47被引用 58

一句话总结

VITAL 通过在特征空间为正样本添加对抗性掩码并使用代价敏感损失来应对类别不平衡，在检测-跟踪框架下提升对最先进方法的鲁棒性。

ABSTRACT

The tracking-by-detection framework consists of two stages, i.e., drawing samples around the target object in the first stage and classifying each sample as the target object or as background in the second stage. The performance of existing trackers using deep classification networks is limited by two aspects. First, the positive samples in each frame are highly spatially overlapped, and they fail to capture rich appearance variations. Second, there exists extreme class imbalance between positive and negative samples. This paper presents the VITAL algorithm to address these two problems via adversarial learning. To augment positive samples, we use a generative network to randomly generate masks, which are applied to adaptively dropout input features to capture a variety of appearance changes. With the use of adversarial learning, our network identifies the mask that maintains the most robust features of the target objects over a long temporal span. In addition, to handle the issue of class imbalance, we propose a high-order cost sensitive loss to decrease the effect of easy negative samples to facilitate training the classification network. Extensive experiments on benchmark datasets demonstrate that the proposed tracker performs favorably against state-of-the-art approaches.

研究动机与目标

解决跟踪检测中对帧特定判别特征的过拟合
在特征空间中增加正样本以捕捉时间上的外观变化
通过引入高阶代价敏感损失来缓解类别不平衡
利用对抗学习来识别在较长时间内更鲁棒的特征
在标准基准上展示优于最先进跟踪器的性能

提出的方法

在最后一个卷积层与第一个全连接层之间插入一个生成网络 G，用于产生特征丢弃的权重掩码
用掩码特征训练 D（分类器），以学习时序鲁棒的表示
使用对抗学习，其中 G 迭代生成最大化 D 损失的掩码，引导 D 避开帧特定的判别特征
应用基于类似 focal 的调制的高阶代价敏感损失，以降低易负样本权重、突出难负样本
在离线预训练和在线更新阶段交替训练 G 和 D，但测试时移除 G
使用带标签的样本进行预训练，并通过多样化正样本和挖掘出的困难负样本进行在线微调

实验结果

研究问题

RQ1通过对抗性生成的特征掩码是否能实现更具时间鲁棒性的跟踪-检测表示？
RQ2在跟踪数据集的极端类别不平衡下，高阶代价敏感损失是否提高了判别能力？
RQ3在标准基准上，VITAL 在精确度和重叠度指标上与最先进跟踪器的比较如何？
RQ4是否通过聚焦于随时间持续存在的特征来实现时间鲁棒性，而非帧特定的判别线索？

主要发现

VITAL 在标准基准如 OTB-2013、OTB-2015 和 VOT-2016 上取得有利的结果
对抗性学习得到的掩码减少对帧特定判别特征的依赖，并促进时序鲁棒表示
所提出的代价敏感损失有助于挖掘困难负样本并在训练中缓解易负样本的优势
消融研究表明随机掩码会降低性能，而对抗性学习掩码则提升鲁棒性和准确性
与若干基线相比，VITAL 在遮挡、变形和视角变化等挑战性条件下表现更好

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。