Skip to main content
QUICK REVIEW

[논문 리뷰] Learning regression and verification networks for long-term visual tracking

Yunhua Zhang, Dong Wang|arXiv (Cornell University)|2018. 09. 12.
Video Surveillance and Tracking Methods참고 문헌 5인용 수 74
한 줄 요약

Introduces a long-term tracking framework that combines an offline-regression network with an online-verification network to perform local search, absent-present decision, and image-wide re-detection; achieves state-of-the-art on VOT2018 LTB35 and OxUvA long-term benchmarks.

ABSTRACT

Compared with short-term tracking, the long-term tracking task requires determining the tracked object is present or absent, and then estimating the accurate bounding box if present or conducting image-wide re-detection if absent. Until now, few attempts have been done although this task is much closer to designing practical tracking systems. In this work, we propose a novel long-term tracking framework based on deep regression and verification networks. The offline-trained regression model is designed using the object-aware feature fusion and region proposal networks to generate a series of candidates and estimate their similarity scores effectively. The verification network evaluates these candidates to output the optimal one as the tracked object with its classification score, which is online updated to adapt to the appearance variations based on newly reliable observations. The similarity and classification scores are combined to obtain a final confidence value, based on which our tracker can determine the absence of the target accurately and conduct image-wide re-detection to capture the target successfully when it reappears. Extensive experiments show that our tracker achieves the best performance on the VOT2018 long-term challenge and state-of-the-art results on the OxUvA long-term dataset.

연구 동기 및 목표

  • Address the gap in long-term tracking where targets appear, disappear, and reappear over long sequences.
  • Develop an offline-trained regression network to generate candidate bounding boxes with similarity scores.
  • Incorporate an online-updated verification network to discriminate the true target among candidates.
  • Enable a confidence-driven switch between local search and image-wide re-detection.
  • Demonstrate superior performance on VOT2018 LTB35 and OxUvA long-term datasets.

제안 방법

  • Use an offline-trained regression network (R) with object-aware feature fusion and region proposal networks to generate and score candidate bounding boxes.
  • Fuse search-region features with a template feature to produce RPN inputs for bounding-box regression and similarity scoring.
  • Incorporate an online-updated verification network (V) to classify candidates as foreground/background and refine the final track.
  • Compute a final frame-wise confidence by combining regression and verification scores to decide presence/absence and trigger re-detection when needed.
  • Dynamically switch between local search and image-wide re-detection based on the confidence score.
  • Train R offline with SSD-like loss combining matching (cross-entropy) and localization (smooth L1) losses; train V online using MDNet-style fine-tuning.

실험 결과

연구 질문

  • RQ1How can regression and verification networks be integrated to handle presence/absence decisions in long-term tracking?
  • RQ2Can an offline regression model robustly propose candidates while an online verification model adapts to appearance changes?
  • RQ3Does a confidence-based switch between local search and global re-detection improve long-term tracking performance?
  • RQ4What is the impact of object-aware feature fusion on candidate proposals and regression accuracy?
  • RQ5How does the proposed approach perform on standard long-term benchmarks (VOT2018 LTB35, OxUvA)?

주요 결과

  • Achieves the best F-score, precision, and recall on VOT-2018 LTB35 among evaluated trackers (F-score 0.610, Pr 0.634, Re 0.588).
  • On VOT-2018 LTB35, reports 100% re-detection success across sequences with a 1-frame appearance in the provided table.
  • On OxUvA long-term dataset (open challenge), achieves the top MaxGM score of 0.544, with TPR 0.609 and TNR 0.485.
  • Ablation shows that adding verification substantially improves long-term performance compared to using regression alone; both concatenation and multiplication in feature fusion are beneficial.
  • Siamese configuration for feature extractors degrades performance relative to separate online/offline branches, indicating the need for separate input handling.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.