QUICK REVIEW

[论文解读] Reading Relevant Feature from Global Representation Memory for Visual Object Tracking

Xinyu Zhou, Pinxue Guo|arXiv (Cornell University)|Feb 22, 2024

Advanced Image and Video Retrieval Techniques被引用 5

一句话总结

本文提出了 RFGM，一种跟踪框架，采用全局表示记忆和相关性注意力，仅读取与当前搜索区域最相关的历史特征，从而提升适应性和速度。在约 71 FPS 的五个基准上实现了具有竞争力的结果。

ABSTRACT

Reference features from a template or historical frames are crucial for visual object tracking. Prior works utilize all features from a fixed template or memory for visual object tracking. However, due to the dynamic nature of videos, the required reference historical information for different search regions at different time steps is also inconsistent. Therefore, using all features in the template and memory can lead to redundancy and impair tracking performance. To alleviate this issue, we propose a novel tracking paradigm, consisting of a relevance attention mechanism and a global representation memory, which can adaptively assist the search region in selecting the most relevant historical information from reference features. Specifically, the proposed relevance attention mechanism in this work differs from previous approaches in that it can dynamically choose and build the optimal global representation memory for the current frame by accessing cross-frame information globally. Moreover, it can flexibly read the relevant historical information from the constructed memory to reduce redundancy and counteract the negative effects of harmful information. Extensive experiments validate the effectiveness of the proposed method, achieving competitive performance on five challenging datasets with 71 FPS.

研究动机与目标

在外观和背景变化下通过避免使用所有记忆特征造成的冗余来推动鲁棒跟踪。
提出一个全局表示记忆（GR memory），在标记层面跨视频存储具有代表性的目标特征。
开发一个相关性注意力机制，用于读取当前帧最相关的历史标记并相应地更新 GR memory。
证明有选择的读取和标记级记忆更新在各基准上提升跟踪精度和速度。

提出的方法

引入一个可跨帧读取全局信息并为当前帧选择最优标记的相关性注意力机制。
通过根据标记相关性有选择地将新模板的标记与现有记忆合并，构建全局表示记忆（GR memory）。
使用自适应排序引导的 Top-k 标记选择以及基于 Gumbel-Softmax 的可微分选择，在标记级别更新记忆。
在选定深度使用带相关性注意力层的 ViT 基编码器，并采用三分支解码器用于分数、偏移量和尺寸预测。
训练时结合用于得分的 focal loss、用于定位的 L1 和 GIoU 损失，以及用于调控记忆标记保留的 ratio loss。

实验结果

研究问题

RQ1在视觉跟踪中，如何为给定的搜索区域识别并读取最相关的历史特征？
RQ2是否可以在标记级全球记忆中维持长期的目标外观表征，同时避免记忆混乱和误差累积？
RQ3与固定模板记忆策略相比，基于相关性的记忆更新是否能提升跟踪的鲁棒性和速度？

主要发现

RFGM 在 TrackingNet、GOT-10k、LaSOT、OTB 和 UAV123 基准测试上取得具有竞争力的结果。
该模型以 71 FPS 运行，展示了在实时跟踪中的高效性。
GR memory 藏存跨视频的具有代表性的目标标记，相较于固定模板更新减少误差累积。
相关性注意力通过有选择地从记忆中读取并实现记忆缩减，在参数几乎不增加的情况下优于标准注意力。
消融实验显示带自适应标记排序的 GR memory 取得最佳整体性能，记忆大小约为 192 个标记最优。
相比于普通注意力，使用相关性注意力在保持性能的同时降低了 MACs。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。