QUICK REVIEW

[论文解读] Deep Learning for Semantic Part Segmentation with High-Level Guidance

Stavros Tsogkas, Iasonas Kokkinos|arXiv (Cornell University)|May 10, 2015

Advanced Neural Network Applications参考文献 40被引用 42

一句话总结

本文提出了一种用于语义部件分割的深度学习框架，结合了全卷积网络与密集CRF后处理，并通过判别式训练的受限玻尔兹曼机（RBM）整合高层形状先验。该方法在行人和人脸解析基准上实现了最先进性能，其多尺度推理方案使在无约束环境中无需真实边界框也能实现精确分割。

ABSTRACT

In this work we address the task of segmenting an object into its parts, or semantic part segmentation. We start by adapting a state-of-the-art semantic segmentation system to this task, and show that a combination of a fully-convolutional Deep CNN system coupled with Dense CRF labelling provides excellent results for a broad range of object categories. Still, this approach remains agnostic to high-level constraints between object parts. We introduce such prior information by means of the Restricted Boltzmann Machine, adapted to our task and train our model in an discriminative fashion, as a hidden CRF, demonstrating that prior information can yield additional improvements. We also investigate the performance of our approach ``in the wild'', without information concerning the objects' bounding boxes, using an object detector to guide a multi-scale segmentation scheme. We evaluate the performance of our approach on the Penn-Fudan and LFW datasets for the tasks of pedestrian parsing and face labelling respectively. We show superior performance with respect to competitive methods that have been extensively engineered on these benchmarks, as well as realistic qualitative results on part segmentation, even for occluded or deformable objects. We also provide quantitative and extensive qualitative results on three classes from the PASCAL Parts dataset. Finally, we show that our multi-scale segmentation scheme can boost accuracy, recovering segmentations for finer parts.

研究动机与目标

通过将高层结构先验整合到深度学习流程中，提升语义部件分割性能。
通过灵活的统计形状模型，应对物体部件在姿态和形变方面的几何可变性挑战。
在真实世界场景中实现精确的部件分割，而无需精确的物体边界框。
证明判别式训练的形状先验可超越原始CNN预测的性能。
开发一种由物体检测器引导的多尺度推理策略，以实现在无约束图像中的鲁棒分割。

提出的方法

采用最先进的语义分割系统（Chen et al., 2014a），即使用全卷积网络后接密集CRF后处理。
引入一种改进的受限玻尔兹曼机（RBM）以建模复杂、多模态的部件配置和形状可变性。
以判别方式训练RBM作为隐藏CRF，最大化给定CNN得分下真实部件掩码的后验概率。
采用多尺度推理策略，利用来自多个图像尺度（原始、1.5×、2×）的特征图，以提高分辨率和准确性。
使用预训练的物体检测器（Ren et al., 2015）生成区域提议，并根据区域与网络标称输入尺寸（321×321）的接近程度，为每个区域选择最优尺度。
对每个图像位置的最优尺度CNN得分进行组合，当多个框重叠时选择得分最高的提议。

实验结果

研究问题

RQ1通用语义分割框架能否被有效适配用于跨多样化物体类别的细粒度语义部件分割？
RQ2如何有效将高层结构先验（如部件布局约束）整合到深度学习流程中，以提升部件分割精度？
RQ3判别式训练的RBM模型是否能比传统形状模型更有效地捕捉物体部件中的复杂、多模态形状变化？
RQ4由物体检测引导的多尺度特征融合如何提升在无约束图像中细粒度部件的分割性能？
RQ5预训练的CNN结合CRF后处理在基准部件分割数据集上，能否超越专门设计的手动特征工程方法？

主要发现

所提方法在行人解析的Penn-Fudan数据集上实现了卓越性能，优于大量精心设计的对比方法。
在LFW数据集上进行人脸标注时，该方法即使在遮挡或形变情况下也能生成逼真的定性结果。
在PASCAL Parts数据集的三个类别上，该方法表现出强劲的定量性能，证实了其在不同物体类别间的泛化能力。
多尺度分割方案通过恢复更精细的部件细节，显著提升了准确性，使PASCAL-Parts验证集的像素准确率从73.9%提升至74.7%，且无需重新训练。
使用CNN特征对RBM进行判别式训练，相比原始CNN预测可带来可测量的性能提升，验证了整合高层先验的有效性。
该系统仅依赖物体检测器进行尺度和位置引导，成功实现了在真实世界图像中的部件分割，而无需真实边界框。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。