QUICK REVIEW

[论文解读] Multiple-Human Parsing in the Wild

Jianshu Li, Jian Zhao|arXiv (Cornell University)|May 19, 2017

Multimodal Machine Learning Applications参考文献 47被引用 61

一句话总结

本论文引入 Multi-Human Parsing (MHP) 数据集和新颖的 MH-Parser 模型，该模型通过基于 Graph-GAN 的亲和学习在不受约束的真实场景中对多名人进行全局解析和实例感知解析。

ABSTRACT

Human parsing is attracting increasing research attention. In this work, we aim to push the frontier of human parsing by introducing the problem of multi-human parsing in the wild. Existing works on human parsing mainly tackle single-person scenarios, which deviates from real-world applications where multiple persons are present simultaneously with interaction and occlusion. To address the multi-human parsing problem, we introduce a new multi-human parsing (MHP) dataset and a novel multi-human parsing model named MH-Parser. The MHP dataset contains multiple persons captured in real-world scenes with pixel-level fine-grained semantic annotations in an instance-aware setting. The MH-Parser generates global parsing maps and person instance masks simultaneously in a bottom-up fashion with the help of a new Graph-GAN model. We envision that the MHP dataset will serve as a valuable data resource to develop new multi-human parsing models, and the MH-Parser offers a strong baseline to drive future research for multi-human parsing in the wild.

研究动机与目标

将多人人体解析问题定义为反映真实场景中多名交互人体的情形。
创建一个大规模的 MHP 数据集，具有像素级的、实例感知的 18 个部件标注。
提出 MH-Parser，在不依赖外部检测器的情况下生成全局解析图与实例掩码。
利用 Graph-GAN 学习高阶关系，改善交错人群的解析。

提出的方法

采用基于 ResNet-101 的表示学习来生成全局的、与实例无关的解析图 G_seg。
在超像素上定义成对亲和图，并通过亲和网络预测亲和图 A。
引入具有基于 GCN 的判别器的 Graph-GAN，以细化亲和图并捕获高阶关系。
计算全局一致性图 M，以区分实例，并通过对预测的 A 进行谱聚类来得到簇。
用 CRF 对实例掩码进行细化，CRF 包含由亲和图提供信息的单元项和对对项。
使用分割损失、L2 亲和损失和 GAN 损失的组合进行训练，并在测试阶段获得像素级的实例感知解析。

实验结果

研究问题

RQ1如何将多人人解析在野外场景中、面对多名交互与遮挡的人时进行表述？
RQ2使用基于图结构的亲和学习的自下而上方法是否能够超越检测器为基础的方法，在分离彼此紧密纠缠的人体实例方面取得更好效果？
RQ3在基于图结构的亲和关系上训练的 Graph-GAN 是否能改进跨实例的身体部位和服装的高阶关系建模？
RQ4在 MHP 数据集上，联合全局解析与实例聚类，然后再进行 CRF 细化的效果如何？

主要发现

MH-Parser 在 MHP 数据集上在 AP_p 和 PCP 指标方面与 Mask R-CNN 与判别损失相比具备竞争力。
在实例接近度高的具有挑战性的子集上，MH-Parser 通过更好地处理纠缠的人体，优于 Mask R-CNN 和 DL。
在 Buffy 数据集评估中，MH-Parser 的前向平均分为 71.11% ，后向为 71.94%（优于此前的方法）。
基线消融实验显示加入 GAN 损失和细化步骤带来增益，基于 GT 的组件得到更高分数（例如 GT 全局分割达到 91.75 AP_p_0.5）。
MHP 数据集包含 4,980 张图像，14,969 个人体实例和 18 个部件标签，展示了多人人体解析的显著真实世界复杂性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。