QUICK REVIEW

[论文解读] Devil in the Details: Towards Accurate Single and Multiple Human Parsing

Tao Ruan, Ting Liu|arXiv (Cornell University)|Sep 17, 2018

Multimodal Machine Learning Applications参考文献 35被引用 67

一句话总结

本文分析人类解析的关键属性——特征分辨率、全局上下文和边界细节，并提出 CE2P，一种简单的端到端框架，在单人解析和多人体解析基准上达到最新结果，包括 LIP、CIHP 和 MHP v2.0。

ABSTRACT

Human parsing has received considerable interest due to its wide application potentials. Nevertheless, it is still unclear how to develop an accurate human parsing system in an efficient and elegant way. In this paper, we identify several useful properties, including feature resolution, global context information and edge details, and perform rigorous analyses to reveal how to leverage them to benefit the human parsing task. The advantages of these useful properties finally result in a simple yet effective Context Embedding with Edge Perceiving (CE2P) framework for single human parsing. Our CE2P is end-to-end trainable and can be easily adopted for conducting multiple human parsing. Benefiting the superiority of CE2P, we achieved the 1st places on all three human parsing benchmarks. Without any bells and whistles, we achieved 56.50\% (mIoU), 45.31\% (mean $AP^r$) and 33.34\% ($AP^p_{0.5}$) in LIP, CIHP and MHP v2.0, which outperform the state-of-the-arts more than 2.06\%, 3.81\% and 1.87\%, respectively. We hope our CE2P will serve as a solid baseline and help ease future research in single/multiple human parsing. Code has been made available at \url{https://github.com/liutinglt/CE2P}.

研究动机与目标

识别影响细粒度人类解析性能的因素（特征分辨率、全局上下文、边缘细节）。
设计一个简单的端到端框架，利用这些属性来实现单人解析。
将该方法扩展到多人体解析，采用有效的分支融合与实例感知精 Refinement（标签 refinement）策略。
提供一个强大、可复用的基线，并附带开源代码，供未来在单人/多人体解析方面的研究使用。

提出的方法

分析三个属性（高分辨率嵌入、全局上下文嵌入、边缘感知）对人类解析中 mIoU 的影响。
引入 CE2P，组成部分为：(i) 高分辨率嵌入模块以恢复细节，(ii) 使用金字塔池化的全局上下文嵌入模块，(iii) 具有多尺度语义边界分支的边缘感知模块。
用 ResNet-101 作为骨干并对解析和边监督使用交叉熵损失，同时使用边缘解析联合损失，对 CE2P 进行端到端训练。
将 CE2P 应用于多人体解析，通过 M-CE2P，结合全局解析与由 Mask R-CNN 生成的局部补丁以及一个 refinement 步骤来提升实例级标签。

实验结果

研究问题

RQ1高分辨率特征、全局上下文和边缘线索能否共同提升细粒度人类解析？
RQ2与最先进方法相比，CE2P 在 LIP 的单人解析上性能如何？
RQ3CE2P 是否可以通过有效的分支融合与标签 refined 扩展到多人体解析？
RQ4在 M-CE2P 的全局分支 vs 局部分支对 CIHP 与 MHP v2.0 的贡献？
RQ5所提 refinement 步骤是否显著提高实例级解析的准确性？

主要发现

Method	pixel acc.	mean acc.	mIoU
DeepLab (VGG-16)	82.66	51.64	41.64
Attention	83.43	54.39	42.92
DeepLab (ResNet-101)	84.09	55.62	44.80
JPPNet	86.39	62.32	51.37
CE2P	87.37	63.20	53.10
CE2P (Flipping)	87.67	65.29	53.10

CE2P 在 LIP 单人解析上超越最先进方法，mIoU 为 53.10%（并且 53.10% 通过翻转，测试时增强带来 0.6% 的增益）。
单模型 CE2P 在 Track 1 上达到 56.50% mIoU；通过翻转，60.10%？（注：请使用表 3 的确切数值：单模型 56.50 mIoU；翻转行在文本中显示 mIoU 60.10）。
CE2P 在 CIHP（Track 2）上超越前人，mean APr 为 50.94，APr@0.5 为 45.31；在 MHP v2.0（Track 5）CE2P 达到 PCP0.5 为 41.82，mean APp 为 42.25。
对于多人体解析，M-CE2P 在包含三个分支（B_g、B_l1、B_l2）再加 refinement 时，指标领先于最强基线。
边缘与高分辨率模块带来实质性增益（mIoU 提升约 1–2%，对袜子、墨镜等小部件的增益更大）。
标签 refined 提升实例级解析表现，因为它解决了 Mask R-CNN 蒙版的欠分割问题。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。