QUICK REVIEW

[论文解读] ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors

Weicheng Kuo, Anelia Angelova|arXiv (Cornell University)|Apr 5, 2019

Advanced Neural Network Applications参考文献 42被引用 21

一句话总结

ShapeMask 提出了一种新颖的实例分割框架，通过优化形状先验并学习实例特定嵌入，提升了对新类别的一般化能力。从一个边界框开始，它利用学习到的先验和嵌入逐步细化物体形状，在跨类别学习中相比最先进方法提升了6.4 AP，且在全监督设置下实现具有竞争力的性能，推理时间仅150ms。

ABSTRACT

Instance segmentation aims to detect and segment individual objects in a scene. Most existing methods rely on precise mask annotations of every category. However, it is difficult and costly to segment objects in novel categories because a large number of mask annotations is required. We introduce ShapeMask, which learns the intermediate concept of object shape to address the problem of generalization in instance segmentation to novel categories. ShapeMask starts with a bounding box detection and gradually refines it by first estimating the shape of the detected object through a collection of shape priors. Next, ShapeMask refines the coarse shape into an instance level mask by learning instance embeddings. The shape priors provide a strong cue for object-like prediction, and the instance embeddings model the instance specific appearance information. ShapeMask significantly outperforms the state-of-the-art by 6.4 and 3.8 AP when learning across categories, and obtains competitive performance in the fully supervised setting. It is also robust to inaccurate detections, decreased model capacity, and small training data. Moreover, it runs efficiently with 150ms inference time and trains within 11 hours on TPUs. With a larger backbone model, ShapeMask increases the gap with state-of-the-art to 9.4 and 6.2 AP across categories. Code will be released.

研究动机与目标

为解决在无需大量类别特定掩码标注的情况下，将实例分割推广到新物体类别的问题。
通过引入中间形状先验作为强于边界框的归纳偏置，提升零样本和少样本一般化能力。
通过类别无关训练与最小监督，实现高效、鲁棒且准确的实例分割。
设计一个轻量级、高性能的掩码分支，在参数和浮点运算量大幅减少的情况下仍保持高精度。
在TPU和GPU上均实现硬件高效，同时达到具有竞争力的检测与分割性能。

提出的方法

ShapeMask 以类别无关的边界框检测作为输入，利用其定位目标物体。
随后通过从学习得到的形状先验集中选择最匹配的形状先验来估计物体形状，提供强几何先验。
一个全卷积网络从形状先验解码出粗略掩码，并通过学习到的实例嵌入进行细化，生成最终的像素级分割结果。
模型采用简单裁剪而非ROIAlign和抖动的真值框，以加速训练并避免NMS或排序操作。
采用单阶段检测器（RetinaNet）实现高效训练，并通过类别无关监督端到端训练。
掩码分支设计为轻量化结构，在16通道容量下仍保持性能，实现35.8 AP，参数量仅为Mask R-CNN的1/130。

实验结果

研究问题

RQ1将形状先验作为中间表示学习，是否能提升实例分割在新类别上的泛化能力？
RQ2在零样本和少样本设置下，形状先验与实例嵌入的结合相较于传统检测或分组方法表现如何？
RQ3轻量级掩码分支在显著减少模型大小和浮点运算量的同时，能在多大程度上保持高精度？
RQ4ShapeMask在检测不准确、训练数据有限及模型容量降低的情况下，其鲁棒性如何？
RQ5在全监督实例分割设置下，ShapeMask能否在训练速度优于最先进方法的同时，实现具有竞争力的性能？

主要发现

在跨类别实例分割任务中，ShapeMask相比之前最先进方法提升6.4 AP，使用更大主干网络时提升达9.4 AP。
仅使用1%的标注数据，其性能即超越最先进方法，展现出强大的少样本泛化能力。
在16通道掩码分支下，ShapeMask实现35.8 AP——比Mask R-CNN高0.4 AP——同时参数量仅为后者的1/130，浮点运算量减少23倍。
模型推理时间仅150ms，在TPU上训练耗时11小时，由于架构优化，训练速度比最先进方法快4倍。
在全监督设置下，ShapeMask在COCO数据集上达到37.2 AP，优于使用相同ResNet-101-FPN主干的Mask R-CNN和RetinaNet。
ShapeMask还可作为强性能目标检测器，使用ResNet-101-FPN主干时达到42.0 AP，使用更大NAS-FPN主干时达到45.4 AP，优于RetinaNet和Mask R-CNN。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。