QUICK REVIEW

[論文レビュー] Multi-Context Attention for Human Pose Estimation

Xiao Chu, Wei Yang|arXiv (Cornell University)|Feb 24, 2017

Human Pose and Action Recognition被引用数 103

ひとこと要約

CNNベースのフレームワークが、マルチ-context attentionとhourglass residual unitsを用いて、MPIIとLSPでの人間の姿勢推定を改善する。 holisticとpart-focused attentionを CRFs でモデル化し、ネストしたhourglassネットワーク内で実現する。

ABSTRACT

In this paper, we propose to incorporate convolutional neural networks with a multi-context attention mechanism into an end-to-end framework for human pose estimation. We adopt stacked hourglass networks to generate attention maps from features at multiple resolutions with various semantics. The Conditional Random Field (CRF) is utilized to model the correlations among neighboring regions in the attention map. We further combine the holistic attention model, which focuses on the global consistency of the full human body, and the body part attention model, which focuses on the detailed description for different body parts. Hence our model has the ability to focus on different granularity from local salient regions to global semantic-consistent spaces. Additionally, we design novel Hourglass Residual Units (HRUs) to increase the receptive field of the network. These units are extensions of residual units with a side branch incorporating filters with larger receptive fields, hence features with various scales are learned and combined within the HRUs. The effectiveness of the proposed multi-context attention mechanism and the hourglass residual units is evaluated on two widely used human pose estimation benchmarks. Our approach outperforms all existing methods on both benchmarks over all the body parts.

研究の動機と目的

画像依存のマルチ-context表現を活用して、遮蔽や背景の混雑下でも頑健な人間の姿勢推定を追求する。
特徴学習を導くマルチ-context注意機構（マルチ解像度、マルチセマンティクス、階層的に全体-部分を結ぶ）を提案。
細部を保ちながら受容野を拡張する Hourglass Residual Units (HRUs) を導入し、ネストされた hourglass ネットワークを実現。
MPIIとLSPで最先端手法を上回るエンドツーエンド学習可能なアーキテクチャを実証。

提案手法

各 hourglassスタック内で、さまざまな尺度の特徴からマルチ解像度の注意マップを生成する。
標準的な残差ユニットを Hourglass Branch を含む Hourglass Residual Unitsに置換して受容野を拡張する（HRU 式: x_{n+1}=x_n+F(x_n;W^F_n)+P(x_n;W^P_n)）。
近傍位置間の空間相関を捉えるため、平均場近似を用いた微分可能な CRFs によって注意をモデル化する。
局所的（初期スタック）およびグローバル（後期スタック）な身体構成を捉えるため、hourglassスタック全体にわたるマルチセマンティクス注意を実装する。
高次のスタックで階層的な holistic-to-part 注意を適用して、局所的な部位位置を精緻化する（p-th part attention）。
MPIIとLSPでエンドツーエンドに訓練し、体の部位ごとに標準のヒートマップ回帰損失（MSE）を用いる。

実験結果

リサーチクエスチョン

RQ1CRFベースの空間注意モデルは、従来のSoftmaxベースの注意よりも体の部位の局在化を改善できるか？
RQ2マルチ-context注意（マルチ解像度、マルチセマンティクス、階層的な holistic-part）は、姿勢推定における遮蔽と背景の混雑に対する頑健性を高めるか？
RQ3Hourglass Residual Units は細部を損なうことなく、part localization を改善するように受容野を効果的に拡張するか？
RQ4マルチ-context注意を備えたエンドツーエンド訓練可能なネストされた hourglass アーキテクチャは、MPIIとLSPでの既存の姿勢推定手法より優れているか？
RQ5 holistic および part-focused 注意成分は、部位ごとの局在精度にどのように寄与するか？

主な発見

Method	Head	Sho.	Elb.	Wri.	Hip	Knee	Ank.	Mean
Ours (MPII)	98.5	96.3	91.9	88.1	90.6	88.0	85.0	91.5
Prior Best (MPII)	98.2	96.3	91.2	87.1	90.1	87.4	83.6	90.9
Ours (LSP)	98.1	93.7	89.3	86.9	93.4	94.0	92.5	92.6

MPIIでPCKh@0.5の最先端を達成、部位ごとの平均は91.5%。
MPIIでは、難易度の高い関節（手首・足首）を、最も近い手法と比べてそれぞれ1.0%、1.4%改善。
LSPでPCK@0.2の最先端を達成し、平均で1.9%改善。
CRFベースの注意は収束が速く、Softmax注意より検証精度が高い。
階層的な部位注意は平均PCKhをさらに89.4%に改善し、左右の腕脚の判別を向上させ、ダブルカウントを減少させる。
HRUは、マルチ解像度とマルチセマンティクス注意と組み合わせると、基準より約1%の追加利得を提供。
全体として、マルチ-context注意とHRUフレームワークは、遮蔽や背景の混雑下でも頑健な性能を発揮する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。