QUICK REVIEW

[论文解读] SOLOv2: Dynamic and Fast Instance Segmentation

Xinlong Wang, Rufeng Zhang|arXiv (Cornell University)|Mar 23, 2020

Advanced Neural Network Applications参考文献 40被引用 476

一句话总结

SOLOv2 引入一个无框、全卷积的实例分割框架，通过定位预测实例掩模，结合动态生成的卷积核和统一的高分辨率掩模特征，并配合快速的 Matrix NMS 做后处理；在 COCO 和 LVIS 上实现了最先进的速度/准确度。

ABSTRACT

In this work, we aim at building a simple, direct, and fast instance segmentation framework with strong performance. We follow the principle of the SOLO method of Wang et al. "SOLO: segmenting objects by locations". Importantly, we take one step further by dynamically learning the mask head of the object segmenter such that the mask head is conditioned on the location. Specifically, the mask branch is decoupled into a mask kernel branch and mask feature branch, which are responsible for learning the convolution kernel and the convolved features respectively. Moreover, we propose Matrix NMS (non maximum suppression) to significantly reduce the inference time overhead due to NMS of masks. Our Matrix NMS performs NMS with parallel matrix operations in one shot, and yields better results. We demonstrate a simple direct instance segmentation system, outperforming a few state-of-the-art methods in both speed and accuracy. A light-weight version of SOLOv2 executes at 31.3 FPS and yields 37.1% AP. Moreover, our state-of-the-art results in object detection (from our mask byproduct) and panoptic segmentation show the potential to serve as a new strong baseline for many instance-level recognition tasks besides instance segmentation. Code is available at: https://git.io/AdelaiDet

研究动机与目标

用更简单、无框的方法来进行实例分割，避免边界框检测器。
开发一个动态、按位置条件的掩模生成机制，以生成高分辨率的实例掩模。
消除掩模预测和后处理中的瓶颈，提高速度同时不牺牲精度。
在 COCO 和 LVIS 上展示强劲性能，包括对目标检测和全景分割的扩展。

提出的方法

针对每个位置动态预测掩模核，使用一个在图像特征条件下学习的 D 维核。
计算一个统一的高分辨率掩模特征表示，在 FPN 各层共享。
将动态生成的核与掩模特征进行卷积，产生每个位置的实例掩模。
使用 CoordConv 增强输入，把显式空间坐标注入到掩模核分支。
并行应用 Matrix NMS 来抑制重复的掩模预测，提高速度和准确度。
可选择从预测的掩模推导边框，以在不进行单独框训练的情况下获得基于框的结果。

实验结果

研究问题

RQ1能否通过对位置条件化预测掩模实现直接、无框的框架来达到具有竞争力的实例分割性能？
RQ2将掩模核学习与掩模特征学习分离是否能提升效率和精度？
RQ3并行的、基于矩阵的 NMS（Matrix NMS）是否能超过传统的 NMS 和用于掩模的 Fast NMS？
RQ4显式坐标信息和统一掩模特征对不同尺度对象的掩模质量有何影响？
RQ5SOLOv2 在 COCO 和 LVIS 的准确性和速度表现如何，是否可以扩展到检测和全景分割？

主要发现

SOLOv2 在 COCO 上实现了速度-精度的最优 Trade-offs，例如使用 ResNet-50-FPN 时达到 38.8% AP，速度为 18 FPS；以及一个轻量级版本时达到 37.1% AP。
SOLOv2 配合 Res-DCN-101-FPN 在 COCO test-dev 设置下达到 41.7% mask AP 和 61.6 mAP 的框架检测。
Matrix NMS 能在不到 1 ms 内处理 500 个掩模，并且在 AP 上比 Fast NMS 高出 0.4%。
统一的掩模特征表示比对每个 FPN 层单独掩模表现更佳，尤其对中、大对象表现明显。
SOLOv2 在 COCO 和 LVIS 上优于许多基于框和无框的基线，且在大对象上有显著提升（如 AP_L 的提升）。
从掩模推导的边框产物在某些配置下达到有竞争力的结果，超越了若干传统检测器。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。