QUICK REVIEW

[论文解读] BoxInst: High-Performance Instance Segmentation with Box Annotations

Zhi Tian, Chunhua Shen|arXiv (Cornell University)|Dec 3, 2020

Advanced Neural Network Applications参考文献 33被引用 28

一句话总结

BoxInst 通过引入一种包含两个组成部分的新掩码损失，在仅使用边界框标注的情况下实现了高性能实例分割：一是预测掩码与真实边界框之间的投影一致性，二是颜色相似像素对之间的成对标签一致性。该方法在没有任何掩码标注的情况下，于 COCO 数据集上实现了 33.2% 的掩码 AP，显著优于以往的弱监督方法。

ABSTRACT

We present a high-performance method that can achieve mask-level instance segmentation with only bounding-box annotations for training. While this setting has been studied in the literature, here we show significantly stronger performance with a simple design (e.g., dramatically improving previous best reported mask AP of 21.1% in Hsu et al. (2019) to 31.6% on the COCO dataset). Our core idea is to redesign the loss of learning masks in instance segmentation, with no modification to the segmentation network itself. The new loss functions can supervise the mask training without relying on mask annotations. This is made possible with two loss terms, namely, 1) a surrogate term that minimizes the discrepancy between the projections of the ground-truth box and the predicted mask; 2) a pairwise loss that can exploit the prior that proximal pixels with similar colors are very likely to have the same category label. Experiments demonstrate that the redesigned mask loss can yield surprisingly high-quality instance masks with only box annotations. For example, without using any mask annotations, with a ResNet-101 backbone and 3x training schedule, we achieve 33.2% mask AP on COCO test-dev split (vs. 39.1% of the fully supervised counterpart). Our excellent experiment results on COCO and Pascal VOC indicate that our method dramatically narrows the performance gap between weakly and fully supervised instance segmentation. Code is available at: https://git.io/AdelaiDet

研究动机与目标

通过仅使用边界框标注进行训练，弥合全监督与弱监督实例分割之间的性能差距。
消除实例分割中对昂贵像素级掩码标注的需求。
开发一种简单、单阶段且高效的模型，其在 COCO 等大规模基准上优于以往的弱监督方法。
在半监督设置下，通过部分掩码和框标注实现对未见类别的泛化能力。
在其他任务（如字符分割）中展示该方法的通用性，仅使用框级监督。

提出的方法

用一种由两项组成的新损失替代 CondInst 中的标准像素级掩码损失：投影一致性和成对标签一致性。
使用一种投影损失，以最小化预测掩码与真实边界框在水平和垂直方向上的投影差异。
应用一种成对损失，基于颜色相似性，对相邻像素（膨胀率 2）进行一致标签预测，仅使用置信度高的像素对以减少噪声。
定义一个颜色相似性阈值，以识别可用于监督的可靠像素对，确保仅可能属于同一标签的像素对才对损失有贡献。
仅使用框标注端到端训练模型，无需迭代优化或外部工具（如 GrabCut）。
利用框架的全卷积特性，实现快速、GPU 并行推理，优于 GrabCut 等较慢且不可微分的方法。

实验结果

研究问题

RQ1仅使用边界框标注是否能有效训练实例分割模型，使其性能接近全监督方法？
RQ2在 COCO 等大规模基准上，简单的单阶段损失设计是否能优于复杂的迭代式弱监督方法？
RQ3所提出的损失组件——投影一致性与成对标签一致性——是否能协同实现无需掩码标注的高质量掩码预测？
RQ4当仅提供部分掩码标注时，模型在未见类别上的泛化能力如何？
RQ5该方法是否可扩展至其他分割任务（如字符分割），仅使用框级监督？

主要发现

BoxInst 在仅使用边界框标注和 ResNet-101 主干网络、3× 训练计划的情况下，在 COCO 测试开发集上实现了 33.2% 的掩码 AP，显著优于此前最佳的 21.1%。
当仅使用投影损失时，掩码 AP 提升至 31.8%；当同时使用投影损失与成对损失时，掩码 AP 达到 32.5%，证明了双损失设计的有效性。
在半监督设置下，当在 20 个类别上使用掩码标注、其余类别使用框标注时，BoxInst 在 60 个未见 COCO 类别上实现了 30.9% 的掩码 AP，显著优于基线方法。
当在 60 个类别上使用掩码标注、在 20 个未见类别上使用框标注时，BoxInst 在未见类别上实现了 35.7% 的掩码 AP，展现出强大的泛化能力。
在 ICDAR 2019 ReCTS 数据集上的定性结果表明，BoxInst 仅使用字符框标注即可生成高质量的字符掩码，证明了该方法的通用性。
BoxInst 显著快于基于 GrabCut 的方法（如 36.5% 掩码 AP vs. GrabCut 的 19.0%），且完全可微分，支持在现代 GPU 上高效端到端训练。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。