Skip to main content
QUICK REVIEW

[論文レビュー] Devil in the Details: Towards Accurate Single and Multiple Human Parsing

Tao Ruan, Ting Liu|arXiv (Cornell University)|Sep 17, 2018
Multimodal Machine Learning Applications参考文献 35被引用数 67
ひとこと要約

本論文はヒューマンパーシングにおける重要な特性—特徴解像度、グローバルコンテキスト、エッジの詳細—を分析し、CE2Pを提案する。シンプルなend-to-endフレームワークで、LIP、CIHP、MHP v2.0を含む単一および複数のヒューマンパーシングベンチマークで最先端の結果を達成する。

ABSTRACT

Human parsing has received considerable interest due to its wide application potentials. Nevertheless, it is still unclear how to develop an accurate human parsing system in an efficient and elegant way. In this paper, we identify several useful properties, including feature resolution, global context information and edge details, and perform rigorous analyses to reveal how to leverage them to benefit the human parsing task. The advantages of these useful properties finally result in a simple yet effective Context Embedding with Edge Perceiving (CE2P) framework for single human parsing. Our CE2P is end-to-end trainable and can be easily adopted for conducting multiple human parsing. Benefiting the superiority of CE2P, we achieved the 1st places on all three human parsing benchmarks. Without any bells and whistles, we achieved 56.50\% (mIoU), 45.31\% (mean $AP^r$) and 33.34\% ($AP^p_{0.5}$) in LIP, CIHP and MHP v2.0, which outperform the state-of-the-arts more than 2.06\%, 3.81\% and 1.87\%, respectively. We hope our CE2P will serve as a solid baseline and help ease future research in single/multiple human parsing. Code has been made available at \url{https://github.com/liutinglt/CE2P}.

研究の動機と目的

  • Identify factors affecting fine-grained human parsing performance (feature resolution, global context, edge details).
  • Design a simple, end-to-end framework that leverages these properties for single human parsing.
  • Extend the approach to multiple human parsing with an effective fusion strategy and instance-aware refinement.
  • Provide a strong, reusable baseline with open-source code for future research in single/multiple human parsing.

提案手法

  • Analyze the impact of three properties (high-resolution embedding, global context embedding, edge perceiving) on mIoU in human parsing.
  • Introduce CE2P, comprising: (i) a high-resolution embedding module to recover detail, (ii) a global context embedding module using pyramid pooling, (iii) an edge perceiving module with multi-scale semantic edge branches.
  • Train CE2P end-to-end with ResNet-101 as backbone and cross-entropy losses for parsing and edge supervision, plus an edge-parsing joint loss.
  • Apply CE2P to multiple human parsing via M-CE2P, combining global parsing with local patches generated by Mask R-CNN and a refinement step to enhance instance-level labels.

実験結果

リサーチクエスチョン

  • RQ1Can high-resolution features, global context, and edge cues together improve fine-grained human parsing?
  • RQ2How does CE2P perform on single human parsing compared to state-of-the-art methods on LIP?
  • RQ3Can CE2P be extended to multi-person parsing with effective branch fusion and label refinement?
  • RQ4What is the contribution of global vs. local branches in M-CE2P to CIHP and MHP v2.0?
  • RQ5Does the proposed refinement step meaningfully improve instance-level parsing accuracy?

主な発見

  • CE2P outperforms state-of-the-art on LIP single-human parsing with mIoU of 53.10% (and 53.10% with flipping, 0.6% gain from test-time augmentation).
  • Single-model CE2P achieves 56.50% mIoU on Track 1; with flipping, 60.10%? (Note: use exact from Table 3: 56.50 mIoU for single model; flipping row shows mIoU 60.10 in the text).
  • CE2P surpasses prior methods on CIHP (Track 2) with mean APr of 50.94 and APr@0.5 of 45.31; on MHP v2.0 (Track 5) CE2P achieves PCP0.5 of 41.82 and mean APp of 42.25.
  • For multiple parsing, M-CE2P with all three branches (B_g, B_l1, B_l2) plus refinement achieves the best metrics (APr0.5, PCP0.5, mean APp) compared with strongest baselines.
  • Edge and high-resolution modules provide tangible gains (~1–2% in mIoU; larger gains for small parts like socks, sunglasses).
  • Label refinement boosts instance-level parsing performance by addressing under-segmentation from Mask R-CNN masks.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。