QUICK REVIEW

[論文レビュー] Learning Deep Context-aware Features over Body and Latent Parts for Person Re-identification

Dangwei Li, Xiaotang Chen|arXiv (Cornell University)|Oct 18, 2017

Video Surveillance and Tracking Methods参考文献 34被引用数 90

ひとこと要約

MSCANを導入し、マルチスケールの文脈認識機能を活用し、Spatial Transformer Networksを用いて潜在的 pedestrian parts を学習させ、全身特徴と融合してMarket1501、CUHK03、MARSで最先端の人検出再識別を達成。

ABSTRACT

Person Re-identification (ReID) is to identify the same person across different cameras. It is a challenging task due to the large variations in person pose, occlusion, background clutter, etc How to extract powerful features is a fundamental problem in ReID and is still an open problem today. In this paper, we design a Multi-Scale Context-Aware Network (MSCAN) to learn powerful features over full body and body parts, which can well capture the local context knowledge by stacking multi-scale convolutions in each layer. Moreover, instead of using predefined rigid parts, we propose to learn and localize deformable pedestrian parts using Spatial Transformer Networks (STN) with novel spatial constraints. The learned body parts can release some difficulties, eg pose variations and background clutters, in part-based representation. Finally, we integrate the representation learning processes of full body and body parts into a unified framework for person ReID through multi-class person identification tasks. Extensive evaluations on current challenging large-scale person ReID datasets, including the image-based Market1501, CUHK03 and sequence-based MARS datasets, show that the proposed method achieves the state-of-the-art results.

研究の動機と目的

深層ネットワークを用いて、堅牢な全身表現と体部（ボディパーツ）表現を学習する。
細粒度の手掛かりを保持するためにマルチスケールの文脈を捉える（例：サングラス、靴）。
新たな制約の下で Spatial Transformer Networks を用いて有用な潜在的歩行者部位を局所化する。
統合された IDE ベースのフレームワークでグローバル特徴とローカル特徴を融合し、分類損失で最適化する。

提案手法

各層内でマルチスケールの文脈を学習し、複数のカーネルからの特徴を結合するために拡張畳み込みを用いた Multi-Scale Context-Aware Network (MSCAN) を提案する。
Spatial Transformer Networks (STN) を用いて、崩壊や背景の乱れを防ぐために中心、値域、および画中焦点という3つの制約で潜在的 body parts を学習・局所化する。
MSCAN による全身特徴のグローバル抽出と 128-d の埋め込みを行い、3つの潜在部位についてパートベースの特徴と 64/128-d の埋め込みを抽出し、それらを 256-d の最終表現へ融合する。
ソフトマックス（アイデンティティ分類）損失と局所化損失（Lloc）を用いてネットワークを訓練し、L = Lcls + λLloc として結合する。

実験結果

リサーチクエスチョン

RQ1マルチスケール文脈と学習可能な潜在部位は、リジッドパートやグローバル手法を超えてReIDの識別力を高めるだろうか？
RQ2学習済み潜在部位は全身表現に補完的な情報を提供するか？
RQ3提案手法は大規模データセット（Market1501、CUHK03、MARS）およびデータセット間設定でどのように性能を示すか？

主な発見

モデル	Rank-1（シングル）	mAP（シングル）	Rank-1（マルチプル）	mAP（マルチプル）
Our-Fusion	80.31	57.53	86.79	66.70
Our-Body	75.45	52.41	83.43	62.03
Our-Part	76.25	53.33	84.12	62.90

全身と学習済み潜在ボディ部位の融合は Market1501 で Rank-1 と mAP の両方で優れた性能を示す（例：Our-Fusion は単一クエリで 80.31 Rank-1、57.53 mAP、複数クエリで 86.79 Rank-1、66.70 mAP）。
STN で学習された潜在部位はリジッド部位より優れている（潜在部位：単一クエリで Rank-1 76.25、mAP 53.33；複数クエリで 84.12、62.90、Market1501）。
局所化制約（Lloc）の適用はパートベースの性能を大幅に向上させる（例：Lcls vs Lcls+Lloc: 67.22→76.25 Rank-1、Market1501 単一クエリ）
三つの拡張率（k=3）を用いた MSCAN が最良の単一モデル性能を提供し、k=3 を超えるとさらなる向上は小さくなる。
このアプローチは Market1501、CUHK03、MARS において、いくつかのベースラインおよび従来の深層学習手法と比較して最先端の結果を達成している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。