QUICK REVIEW

[论文解读] Visual Tracking by Reinforced Decision Making.

Janghoon Choi, Junseok Kwon|arXiv (Cornell University)|Feb 21, 2017

Video Surveillance and Tracking Methods参考文献 35被引用 26

一句话总结

本文提出一种基于深度强化学习的实时视觉跟踪算法，通过选择最优模板来缓解因外观模型更新不准确导致的漂移问题。策略网络在基准数据集生成的合成 episode 上通过策略梯度进行训练，实现 43 fps 的实时性能，同时提升跟踪精度。

ABSTRACT

One of the major challenges of model-free visual tracking problem has been the difficulty originating from the unpredictable and drastic changes in the appearance of objects we target to track. Existing methods tackle this problem by updating the appearance model on-line in order to adapt to the changes in the appearance. Despite the success of these methods however, inaccurate and erroneous updates of the appearance model result in a tracker drift. In this paper, we introduce a novel real-time visual tracking algorithm based on a template selection strategy constructed by deep reinforcement learning methods. The tracking algorithm utilizes this strategy to choose the appropriate template for tracking a given frame. The template selection strategy is self-learned by utilizing a simple policy gradient method on numerous training episodes randomly generated from a tracking benchmark dataset. Our proposed reinforcement learning framework is generally applicable to other confidence map based tracking algorithms. The experiment shows that our tracking algorithm runs in real-time speed of 43 fps and the proposed policy network effectively decides the appropriate template for successful visual tracking.

研究动机与目标

解决视觉跟踪中外观变化带来的挑战，特别是由在线外观模型更新错误引起的漂移问题。
开发一种实时跟踪框架，能够为每一帧动态选择最可靠的模板。
利用强化学习训练一种自适应模板选择策略，无需依赖人工设计的启发式规则。
构建一种可泛化的框架，适用于基于置信度图的跟踪算法。

提出的方法

在从跟踪基准数据集衍生的随机生成训练 episode 上，使用简单的策略梯度方法训练策略网络。
将模板选择建模为序列决策问题，其中智能体在每帧中从候选区域中选择最佳模板。
使用深度神经网络编码视觉特征，并输出候选模板上的概率分布。
基于跟踪精度定义强化学习奖励，鼓励选择能最小化定位误差的模板。
将训练好的策略集成到实时跟踪流水线中，实现在推理过程中动态更新模板。
通过将模板选择逻辑与特征提取及匹配组件解耦，确保与现有基于置信度图的跟踪器兼容。

实验结果

研究问题

RQ1基于强化学习的模板选择策略是否能有效减少在外观变化下视觉跟踪中的跟踪漂移？
RQ2通过策略梯度训练的策略网络在实时场景下对未见跟踪序列的泛化能力如何？
RQ3与基线在线外观模型更新策略相比，所提方法在跟踪精度上提升了多少？
RQ4强化学习框架能否有效应用于其他基于置信度图的跟踪算法？

主要发现

所提出的跟踪器实现了 43 帧每秒的实时性能，适用于实际部署。
策略网络成功学习到最优模板的选择，显著减少了因模型更新错误导致的跟踪漂移。
强化学习框架具有良好的泛化能力，可集成到其他基于置信度图的跟踪算法中。
在合成 episode 上通过策略梯度进行训练，生成了鲁棒且自适应的模板选择策略。
由于采用了更可靠的模板选择机制，该方法在跟踪精度上优于传统的在线外观建模方法。
该框架通过在每帧动态选择最具判别性的模板，有效应对了剧烈的外观变化。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。