QUICK REVIEW

[论文解读] Learning Dynamic Memory Networks for Object Tracking

Tianyu Yang, Antoni B. Chan|arXiv (Cornell University)|Mar 20, 2018

Video Surveillance and Tracking Methods被引用 28

一句话总结

本文提出 MemTrack，一种实时视觉目标追踪器，采用由 LSTM 控制的外部记忆模块的动态记忆网络，以自适应地更新外观变化的模板。通过采用基于注意力的记忆访问机制与通道门控残差学习，该方法在 OTB 和 VOT 基准测试中实现了最先进水平的准确率，同时保持了 50 fps 的推理速度，优于所有实时与非实时追踪器。

ABSTRACT

Template-matching methods for visual tracking have gained popularity recently due to their comparable performance and fast speed. However, they lack effective ways to adapt to changes in the target object's appearance, making their tracking accuracy still far from state-of-the-art. In this paper, we propose a dynamic memory network to adapt the template to the target's appearance variations during tracking. An LSTM is used as a memory controller, where the input is the search feature map and the outputs are the control signals for the reading and writing process of the memory block. As the location of the target is at first unknown in the search feature map, an attention mechanism is applied to concentrate the LSTM input on the potential target. To prevent aggressive model adaptivity, we apply gated residual template learning to control the amount of retrieved memory that is used to combine with the initial template. Unlike tracking-by-detection methods where the object's information is maintained by the weight parameters of neural networks, which requires expensive online fine-tuning to be adaptable, our tracker runs completely feed-forward and adapts to the target's appearance changes by updating the external memory. Moreover, unlike other tracking methods where the model capacity is fixed after offline training --- the capacity of our tracker can be easily enlarged as the memory requirements of a task increase, which is favorable for memorizing long-term object information. Extensive experiments on OTB and VOT demonstrates that our tracker MemTrack performs favorably against state-of-the-art tracking methods while retaining real-time speed of 50 fps.

研究动机与目标

解决模板匹配追踪器在追踪过程中难以适应目标外观变化的局限性。
通过将外观信息外部存储于动态记忆块中，消除对在线微调的需求。
通过增加记忆大小实现模型容量的可扩展性，支持长期外观建模。
在不牺牲实时性能的前提下，提升基线模板方法（如 SiamFC）的追踪准确率。
开发一种可微分、端到端可训练的框架，结合初始模板的可靠性与自适应记忆检索能力。

提出的方法

一个外部可寻址记忆块用于存储目标的历史外观特征，实现对外观变化的长期记忆。
LSTM 控制器以搜索特征图作为输入，通过注意力机制在访问记忆前聚焦于潜在目标区域。
LSTM 生成读取和写入记忆块的控制信号，实现模板的动态自适应。
门控残差模板学习机制将初始模板与检索到的记忆残差模板相结合，其中通道门控调节记忆信息的添加比例。
最终匹配模板通过初始模板与门控残差的逐元素相加形成，以保留保守的外观信息。
整个网络可微分，并通过 SGD 端到端训练，实现实时推理而无需在线参数更新。

实验结果

研究问题

RQ1动态记忆网络是否能在无需在线微调的情况下，有效适应外观变化对追踪模板的影响？
RQ2在缺乏真实目标位置信息的情况下，基于注意力的记忆访问机制如何提升模板匹配性能？
RQ3门控残差学习在多大程度上可防止对近期帧的过拟合，同时支持模板的自适应更新？
RQ4记忆容量是否可扩展以提升长期追踪性能，而不会增加模型复杂度？
RQ5与最先进水平的实时与非实时追踪器相比，该方法在准确率与速度上的表现如何？

主要发现

在 OTB-2015 基准测试中，MemTrack 的精度比 SiamFC 提高 6.4%，成功率提高 7.6%。
在 VOT-2016 数据集上，MemTrack 的 EAO 达到 0.2729，超过 SOTA 的 EAO 上限（0.251），在实时追踪器中排名第一。
在 AUC 得分上，MemTrack 超过非实时 SOTA 追踪器（如 CREST、MCPF 和 SRDCFdecon），且运行速度为 50 fps，而后者约为 1 fps。
在低分辨率、遮挡和尺度变化等具有挑战性的属性上，MemTrack 取得了最高的 AUC 得分，其中在低分辨率序列上比 SiamFC 提高 10.7%。
在 OTB-2015 的全部 8 项属性（包括光照变化、运动模糊、平面内/平面外旋转）中，该追踪器均保持了稳健性能。
消融实验表明，注意力机制、门控残差学习与记忆控制的组合显著优于各变体的消融设置。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。