Skip to main content
QUICK REVIEW

[论文解读] Unified Multi-Dataset Training for TBPS

Nilanjana Chatterjee, Sidharatha Garg|arXiv (Cornell University)|Jan 21, 2026
Video Surveillance and Tracking Methods被引用 0
一句话总结

Scale-TBPS 通過噪聲感知的數據篩選與可擴展的判別性身份學習目標,在多個 TBPS 數據集上訓練單一統一的文本基礎人員搜索模型,表現超越數據集特定與天真的聯合訓練方法。

ABSTRACT

Text-Based Person Search (TBPS) has seen significant progress with vision-language models (VLMs), yet it remains constrained by limited training data and the fact that VLMs are not inherently pre-trained for pedestrian-centric recognition. Existing TBPS methods therefore rely on dataset-centric fine-tuning to handle distribution shift, resulting in multiple independently trained models for different datasets. While synthetic data can increase the scale needed to fine-tune VLMs, it does not eliminate dataset-specific adaptation. This motivates a fundamental question: can we train a single unified TBPS model across multiple datasets? We show that naive joint training over all datasets remains sub-optimal because current training paradigms do not scale to a large number of unique person identities and are vulnerable to noisy image-text pairs. To address these challenges, we propose Scale-TBPS with two contributions: (i) a noise-aware unified dataset curation strategy that cohesively merges diverse TBPS datasets; and (ii) a scalable discriminative identity learning framework that remains effective under a large number of unique identities. Extensive experiments on CUHK-PEDES, ICFG-PEDES, RSTPReid, IIITD-20K, and UFine6926 demonstrate that a single Scale-TBPS model outperforms dataset-centric optimized models and naive joint training.

研究动机与目标

  • 超越以數據集為中心的 TBPS,訓練一個可處理多分佈的單一統一模型的動機。
  • 合併 TBPS 數據集時,減輕跨數據集的噪聲與分佈移位。
  • 開發可擴展的身份學習,使身份數量增長時仍具辨識力。
  • 證明統一訓練可超越獨立訓練、以數據集為中心的模型。

提出的方法

  • Noise-Aware Unified Dataset Curation (NDC) 使用一組預訓練 TBPS 模型的集合來篩選不可靠的文本–圖像對,而不使用硬閾值。
  • Discriminative Identity Learning (DIL) 引入 Multimodal Angular Identity loss,以強制對圖像與文本模態的角度邊界。
  • 使用共享的多模態分類器權重向量 w 來計算所有身份的角度邊界基 logits。
  • 訓練結合 Multimodal Angular Identity loss 與 ranking loss,以優化跨模態對齊與判別性。
  • 該方法以 CLIP 為基礎的編碼器為基礎,並以可擴展的角度邊界目標進行擴展。
Figure 1: Illustration of Scale-TBPS. (a) illustrates the conventional dataset-centric training paradigm, where separate models are independently trained for different distributions, resulting in isolated models. (b) depicts naive joint training, where a single model is trained on merged datasets; h
Figure 1: Illustration of Scale-TBPS. (a) illustrates the conventional dataset-centric training paradigm, where separate models are independently trained for different distributions, resulting in isolated models. (b) depicts naive joint training, where a single model is trained on merged datasets; h

实验结果

研究问题

  • RQ1能否在具有不同分佈的多個 TBPS 數據集上有效訓練單一模型?
  • RQ2在大規模 TBPS 中,如何在不丟失有用數據的前提下,篩選嘈雜的跨數據集文本–圖像對?
  • RQ3以角度邊界為基礎之辨識性身份學習目標,是否可在大量跨數據集身份下擴展?
  • RQ4測試時相似度正規化對統一 TBPS 模型的檢索性能有何影響?

主要发现

  • 單一 Scale-TBPS 模型,搭配 NDC 與 DIL,在多個 TBPS 基準上與數據集特定和天真聯合訓練方法相匹配或超越。
  • Scale-TBPS 在多個 CLIP 基礎與非 CLIP 基線下,取得更優的平均精度均值(mAP)與排序指標。
  • 測試時的相似度正規化(NNN)在檢索表現上帶來顯著提升,尤其在某些數據集中。
  • NDC 模組在一次性預處理步驟中有效篩選嘈雜對,使多個 TBPS 數據集的可擴展合併成為可能。
  • DIL 可視化顯示相比天真聯合訓練,類內聚簇更緊密、類間分離更清晰。
Figure 2: Overview of the proposed Scale-TBPS. (a) Noise-Aware Data Curation (NDC): Text–image pairs from the joint dataset ( $\mathcal{D}$ ) are encoded using a set of pretrained and frozen models $\Phi$ . top- $K$ retrieved samples are computed independently for each model. A pair is retained as a
Figure 2: Overview of the proposed Scale-TBPS. (a) Noise-Aware Data Curation (NDC): Text–image pairs from the joint dataset ( $\mathcal{D}$ ) are encoded using a set of pretrained and frozen models $\Phi$ . top- $K$ retrieved samples are computed independently for each model. A pair is retained as a

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。