QUICK REVIEW

[论文解读] Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation

Bowen Cheng, Maxwell D. Collins|arXiv (Cornell University)|Nov 22, 2019

Advanced Neural Network Applications参考文献 81被引用 42

一句话总结

Panoptic-DeepLab 提供一个简单的自下而上的单镜头全景分割系统，具备类别无关的实例中心和语义头，在 Cityscapes、Mapillary Vistas 和 COCO 上达到最先进的结果，同时接近实时运行。

ABSTRACT

In this work, we introduce Panoptic-DeepLab, a simple, strong, and fast system for panoptic segmentation, aiming to establish a solid baseline for bottom-up methods that can achieve comparable performance of two-stage methods while yielding fast inference speed. In particular, Panoptic-DeepLab adopts the dual-ASPP and dual-decoder structures specific to semantic, and instance segmentation, respectively. The semantic segmentation branch is the same as the typical design of any semantic segmentation model (e.g., DeepLab), while the instance segmentation branch is class-agnostic, involving a simple instance center regression. As a result, our single Panoptic-DeepLab simultaneously ranks first at all three Cityscapes benchmarks, setting the new state-of-art of 84.2% mIoU, 39.0% AP, and 65.5% PQ on test set. Additionally, equipped with MobileNetV3, Panoptic-DeepLab runs nearly in real-time with a single 1025x2049 image (15.8 frames per second), while achieving a competitive performance on Cityscapes (54.1 PQ% on test set). On Mapillary Vistas test set, our ensemble of six models attains 42.7% PQ, outperforming the challenge winner in 2018 by a healthy margin of 1.5%. Finally, our Panoptic-DeepLab also performs on par with several top-down approaches on the challenging COCO dataset. For the first time, we demonstrate a bottom-up approach could deliver state-of-the-art results on panoptic segmentation.

研究动机与目标

建立一个稳健的自下而上的全景分割基线，使其在关键基准上达到或超过两阶段方法。
提出一个简单但强大的架构，具有 dual-ASPP 和 dual-decoder 分支，用于语义和实例分割。
使用类别无关的实例中心回归方法，实现快速、可并行化的实例分组。
通过高效的多数投票合并策略融合语义和实例预测。
在多数据集上展示强烈的速度-精度权衡。

提出的方法

采用一个共享编码器骨干，增加空洞卷积以产生密集特征。
实现 dual ASPP 和 dual decoder 模块——一个分支用于语义分割，另一个用于类无关的实例分割。
使用三个损失函数进行训练：对语义分割使用加权自举交叉熵；对实例中心热力图使用均方误差；对中心偏移使用 L1。
用中心来表示每个实例，并学习像素到其相应中心的偏移（中心热力图由 2D 高斯编码）。
推理阶段，将前景像素分组到最近的预测中心，并通过快速多数投票操作合并语义和实例输出。
通过简单、可并行化的合并步骤实现端到端全景预测，接近实时性能。）

实验结果

研究问题

RQ1自下而上的单-shot 方法是否能在标准全景基准上达到最先进水平？
RQ2Panoptic-DeepLab 在 Cityscapes、Mapillary Vistas 和 COCO 上的性能与效率权衡如何？
RQ3双分支（语义和实例）设计与简单中心基于实例分组相比顶层方法在准确性和速度上有何差异？
RQ4架构选择（双 ASPP、双解码、通道大小）对分割质量和运行时有何影响？

主要发现

在 Cityscapes 上，Panoptic-DeepLab 单一模型达到最先进水平：PQ 65.5%、AP 39.0%、mIoU 84.2% 在测试集。
在 Mapillary Vistas 上，单模型在 val 达到 PQ 40.6%，通过六模型集成，在 val 达到 42.2% PQ，在 test 达到 42.7% PQ，较 2018 挑战冠军提高 1.5% PQ。
在 COCO test-dev，Panoptic-DeepLab 达到 41.2% PQ（单尺度），较先前最佳自下而上方法提升 4.5% PQ，与顶层方法竞争。
使用 MobileNetV3 后端时，在 V100 GPU 上，对 1025x2049 图像接近实时运行，速度为 15.8 帧/秒，同时在 Cityscapes 上保持竞争力的性能（测试集 54.1 PQ）。
在不同骨干和尺度下，该方法展示出强烈的速度-精度权衡，常常超过以前的自下而上方法，在若干基准上接近顶层方法。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。