QUICK REVIEW

[论文解读] Panoptic Feature Pyramid Networks

Alexander Kirillov, Ross Girshick|arXiv (Cornell University)|Jan 8, 2019

Advanced Neural Network Applications参考文献 56被引用 61

一句话总结

Panoptic FPN 在 Mask R-CNN（以 FPN 作为骨干网）上新增一个轻量级语义分割分支，使单个网络能够执行实例分割和语义分割及其联合的全景分割，同时保持竞争力的精度并降低计算量。

ABSTRACT

The recently introduced panoptic segmentation task has renewed our community's interest in unifying the tasks of instance segmentation (for thing classes) and semantic segmentation (for stuff classes). However, current state-of-the-art methods for this joint task use separate and dissimilar networks for instance and semantic segmentation, without performing any shared computation. In this work, we aim to unify these methods at the architectural level, designing a single network for both tasks. Our approach is to endow Mask R-CNN, a popular instance segmentation method, with a semantic segmentation branch using a shared Feature Pyramid Network (FPN) backbone. Surprisingly, this simple baseline not only remains effective for instance segmentation, but also yields a lightweight, top-performing method for semantic segmentation. In this work, we perform a detailed study of this minimally extended version of Mask R-CNN with FPN, which we refer to as Panoptic FPN, and show it is a robust and accurate baseline for both tasks. Given its effectiveness and conceptual simplicity, we hope our method can serve as a strong baseline and aid future research in panoptic segmentation.

研究动机与目标

目标是在单一网络架构内统一实例和语义分割。
评估对带 FPN 的 Mask R-CNN 的最小扩展，以支持密集像素级标注以及区域输出。
评估在 COCO 和 Cityscapes 数据集上实例分割、语义分割和全景分割的性能。
研究在全景设定下多任务学习的训练动态和损失平衡问题。

提出的方法

以 Mask R-CNN + FPN 作为骨干网开始。
附加一个轻量级语义分割分支，将多尺度 FPN 特征汇聚成密集的逐像素输出。
用联合损失 L = lambda_i * (分类 + 框 + 掩模) + lambda_s * 语义损失进行训练，并调节 lambda_i 和 lambda_s。
语义分支设计将每个 FPN 层上采样到 1/4 尺度，并汇总所有层的特征以产生逐像素的类别分数。
推理阶段包括后处理，以解决实例和语义预测之间的重叠问题，使其符合全景分割的要求。

实验结果

研究问题

RQ1单个、最小扩展的带 FPN 的 Mask R-CNN 是否能够在实例和语义分割任务上都取得强性能？
RQ2联合训练加入语义分支是否会提升或至少不损害实例分割的精度，反之亦然？
RQ3在类似的计算预算下，Panoptic FPN 在全景分割上的表现如何，相较于两个单独网络？
RQ4架构选择和损失权重对多任务训练稳定性和性能有何影响？

主要发现

Setting	AP	PQ Th	mIoU	PQ St	PQ
COCO Panoptic FPN original (R50-FPN × 2)	33.9	46.6	40.2	27.9	39.2
COCO Panoptic FPN combined (R50-FPN × 2)	33.3	45.9	41.0	28.7	39.0
Cityscapes Panoptic FPN original (R50-FPN × 2)	32.2	51.3	74.5	62.2	57.7
Cityscapes Panoptic FPN combined (R50-FPN × 2)	32.0	51.6	75.0	62.2	57.7

在联合训练时，Panoptic FPN 对实例和语义分割均达到有竞争力甚至优越的结果，且相比两个单独网络计算量约为一半。
在 FPN 上使用轻量级密集预测分支进行语义分割，在 COCO 和 Cityscapes 数据集上的 mIoU 值具有竞争力，且无需基于扩张卷积的骨干网。
在合适的损失权重下进行的联合训练可以在提升一个任务的同时维持或提升另一个任务，从而实现对 stuff 和 thing 的有效多任务学习。
在相似预算下，单一 FPN 骨干网的全景分割性能优于同类在 COCO test-dev 和 Cityscapes 的单模型条目，确立 Panoptic FPN 作为强基线。
对语义分支采用简单的多尺度特征求和聚合即可有效且比拼接更高效。
使用单一网络进行全景分割即可达到双网络方法的准确性，且显著降低计算量；在某些情况下甚至优于它们。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。