QUICK REVIEW

[논문 리뷰] OmniTracker: Unifying Object Tracking by Tracking-with-Detection

Junke Wang, Zuxuan Wu|arXiv (Cornell University)|2023. 03. 21.

Video Surveillance and Tracking Methods인용 수 11

한 줄 요약

OmniTracker는 Reference-guided Feature Enhancement 및 공유 네트워크 가중치를 사용하는 추적-검출 패러다임을 통해 인스턴스 추적(SOT/VOS)과 카테고리 추적(MOT/MOTS/VIS)을 공동으로 처리하는 통합 Deformable DETR 기반 모델을 제시합니다.

ABSTRACT

Visual Object Tracking (VOT) aims to estimate the positions of target objects in a video sequence, which is an important vision task with various real-world applications. Depending on whether the initial states of target objects are specified by provided annotations in the first frame or the categories, VOT could be classified as instance tracking (e.g., SOT and VOS) and category tracking (e.g., MOT, MOTS, and VIS) tasks. Different definitions have led to divergent solutions for these two types of tasks, resulting in redundant training expenses and parameter overhead. In this paper, combing the advantages of the best practices developed in both communities, we propose a novel tracking-with-detection paradigm, where tracking supplements appearance priors for detection and detection provides tracking with candidate bounding boxes for the association. Equipped with such a design, a unified tracking model, OmniTracker, is further presented to resolve all the tracking tasks with a fully shared network architecture, model weights, and inference pipeline, eliminating the need for task-specific architectures and reducing redundancy in model parameters. We conduct extensive experimentation on seven prominent tracking datasets of different tracking tasks, including LaSOT, TrackingNet, DAVIS16-17, MOT17, MOTS20, and YTVIS19, and demonstrate that OmniTracker achieves on-par or even better results than both task-specific and unified tracking models.

연구 동기 및 목표

인스턴스 추적과 카테고리 추적 작업을 모두 다루는 통합 추적 프레임워크를 동기화시키는 동기를 제시한다.
트래커 유도 프리미어를 증강시키고 탐지에 탐지된 박스가 추적 연합에 도움을 주는 추적-검출 패러다임을 제안한다.
다중 추적 작업(SOT, VOS, MOT, MOTS, VIS)을 처리할 수 있는 공유 아키텍처, 가중치 및 추론 파이프라인으로 OmniTracker를 개발한다.
메모리 기반 아이덴티티 임베딩과 대조적 ReID 손실을 활용하여 프레임 간 물체 아이덴티티를 안정적으로 연계한다.

제안 방법

Reference-guided Feature Enhancement (RFE) 모듈을 도입하여 이전 프레임의 외관 priors와 현재 프레임 특징을 교차 주의(Cross-Attention)로 융합한다.
향상된 특징을 Deformable DETR 탐지기에 임베딩하여 모든 프레임의 경계 상자와 마스크를 예측한다.
대조적 ReID 손실을 갖춘 아이덴티티 임베딩의 메모리 뱅크를 사용하여 프레임 간 물체 아이덴티티를 안정적으로 학습한다.
세트-p 예측 프레임워크에서 분류, 박스 회귀 및 마스크 항을 결합한 프레임별 탐지 손실을 계산한다.
Kalman 필터 운동 모델링과 모든 작업에 걸친 Hungarian 데이터 연결을 포함하는 통합 온라인 추적 파이프라인을 채택한다.
다수의 추적 데이터셋(SOT, VOS, MOT, MOTS, VIS) 및 COCO에 대해 공동으로 학습하여 작업-통합 최적화를 가능하게 한다.

실험 결과

연구 질문

RQ1단일 공유 네트워크 아키텍처와 학습 체계가 인스턴스 추적과 카테고리 추적 작업을 효과적으로 해결할 수 있는가?
RQ2Reference-guided Feature Enhancement(RFE)를 도입하여 추적용 탐지기의 외관 priors를 개선하고 프레임 간 연합을 견고하게 하는가?
RQ3다양한 추적 작업 간의 공동 학습이 작업별 또는 하이브리드 학습과 비교하여 성능과 일반화에 어떤 영향을 미치는가?
RQ4메모리 기반 아이덴티티 임베딩과 대조적 ReID 손실이 프레임 간 물체 아이덴티티를 유지하는 데 어떤 역할을 하는가?

주요 결과

OmniTracker는 LaSOT, TrackingNet, DAVIS 16-17, MOT17, MOTS20, YTVIS19를 포함한 일곱 개 추적 벤치마크에서 최첨단 또는 경쟁력 있는 결과를 달성한다.
RFE 모듈은 탐지에 외관 priors를 제공하여 TrackingNet의 P_norm 및 MOT17의 MOTA를 향상시키는 효과를 보이며 제거(ablation) 시 감소한다.
작업 간의 공동 학습은 개별 학습 및 Unicorn 기준선 대비 일관된 이점을 제공하며, 여러 벤치마크에서 현저한 이점을 보인다.
OmniTracker는 SOT, VOS, MOT, MOTS, VIS에 대해 완전히 공유된 파이프라인을 유지하면서 작업별 모델과 비견되는 경쟁적인 FPS를 달성한다.
VOS에서 OmniTracker는 다중 작업 기준선 및 일부 통합 모델보다 우수한 성능을 보이며 프레임 단위 및 장기 연합에 강한 성능을 나타낸다.
VIS의 경우 OmniTracker-L은 VIS 특화 방법에 비해 경쟁력 있는 mAP 및 관련 지표를 달성한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.