QUICK REVIEW

[论文解读] Holistic, Instance-Level Human Parsing

Qizhu Li, Anurag Arnab|arXiv (Cornell University)|Sep 11, 2017

Advanced Neural Network Applications参考文献 31被引用 23

一句话总结

该论文提出了一种整体的、端到端的深度学习框架，用于实例级别的人员分割，通过使用基于人员检测结果的可微分条件随机场（CRF），联合实现身体部位和个体的实例级分割。该方法在实例级别部位分割和人员分割任务上均取得了最先进性能，并在类别级别部位分割上也取得了具有竞争力的结果，所有结果均在一次前向传播中完成。

ABSTRACT

Object parsing -- the task of decomposing an object into its semantic parts -- has traditionally been formulated as a category-level segmentation problem. Consequently, when there are multiple objects in an image, current methods cannot count the number of objects in the scene, nor can they determine which part belongs to which object. We address this problem by segmenting the parts of objects at an instance-level, such that each pixel in the image is assigned a part label, as well as the identity of the object it belongs to. Moreover, we show how this approach benefits us in obtaining segmentations at coarser granularities as well. Our proposed network is trained end-to-end given detections, and begins with a category-level segmentation module. Thereafter, a differentiable Conditional Random Field, defined over a variable number of instances for every input image, reasons about the identity of each part by associating it with a human detection. In contrast to other approaches, our method can handle the varying number of people in each image and our holistic network produces state-of-the-art results in instance-level part and human segmentation, together with competitive results in category-level part segmentation, all achieved by a single forward-pass through our neural network.

研究动机与目标

解决现有人体分割方法仅在类别级别运行的局限性，无法在多人场景中区分属于不同个体的身体部位。
实现人体部位和完整人体的实例级别分割，同时支持准确的部位到个体的关联。
开发一种鲁棒的、可端到端训练的神经网络，能够处理每张图像中人数可变的情况，并对不完美或部分的物体检测结果具有鲁棒性。
证明实例级别部位分割可提升整体人体实例分割性能，优于先前方法。

提出的方法

该框架首先使用全卷积网络（FCN）实现类别级别的部位分割模块。
然后采用一种可微分的、实例感知的条件随机场（CRF），该CRF可处理每张图像中可变数量的人体实例，以人员检测的边界框作为输入。
CRF通过可学习的、可微分的消息传递机制，优化部位到个体的分配，从而将每个分割部位与特定的人体实例关联。
整个网络通过一种新型损失函数实现端到端训练，该损失函数可适应每张图像中实例数量的变化。
模型输出包括实例级别的部位分割和实例级别的人员分割（通过每人的所有部位的并集实现），无需后处理。
由于CRF具备全局推理能力，该方法对误报和部分边界框具有鲁棒性。

实验结果

研究问题

RQ1是否能够以整体的、端到端可微的方式实现实例级别的人员分割，同时联合分割身体部位和个体？
RQ2与标准实例分割方法相比，建模部位到实例的关联如何提升人体实例分割的准确性？
RQ3学习部位级别的结构在多大程度上提升了整体人体的分割性能，尤其是在遮挡或人群密集的场景中？
RQ4单次前向传播是否能够实现类别级别和实例级别的分割，且不带来架构或推理复杂度的权衡？

主要发现

所提方法在人体实例分割任务上的平均精度（mAP）在IoU阈值为0.5时达到61.0%，显著超越了先前最先进方法。
在实例级别部位分割任务上，该方法在IoU阈值为0.5时达到70.2%的AP，优于先前方法如Arnab等人[2]（57.4%）和R2-IOS[31]（60.4%）。
类别级别部位分割的平均IoU达到66.3%，与最先进方法具有竞争力，并较初始类别级别模块提升了0.4%。
该模型对检测质量较差或边界框不完整的情况具有鲁棒性：即使检测框仅部分覆盖个体，仍能成功分割部位。
通过实例CRF的全局推理能力，该方法在重叠人群密集场景中提升了泛化能力。
人体实例分割性能的提升归因于在训练过程中显式建模了部位级别的关系以及部位到实例的关联。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。