QUICK REVIEW

[论文解读] QueryInst: Parallelly Supervised Mask Query for Instance Segmentation

Y.K. Fang, Shusheng Yang|arXiv (Cornell University)|May 5, 2021

Advanced Neural Network Applications参考文献 23被引用 12

一句话总结

QueryInst 提出了一种基于查询的实例分割框架，通过各阶段并行监督以及查询到掩码的一对一对应关系，消除了多阶段头部连接和提议不一致性。使用 ResNet-101-FPN 主干网络，在 COCO 上实现了 48.1 的框 AP 和 42.8 的掩码 AP，比 HTC 提高了 2 个 AP 点，同时推理速度提升了 2.4 倍。

ABSTRACT

Recently, query based object detection frameworks achieve comparable performance with previous state-of-the-art object detectors. However, how to fully leverage such frameworks to perform instance segmentation remains an open problem. In this paper, we present QueryInst, a query based instance segmentation method driven by parallel supervision on dynamic mask heads. The key insight of QueryInst is to leverage the intrinsic one-to-one correspondence in object queries across different stages, as well as one-to-one correspondence between mask RoI features and object queries in the same stage. This approach eliminates the explicit multi-stage mask head connection and the proposal distribution inconsistency issues inherent in non-query based multi-stage instance segmentation methods. We conduct extensive experiments on three challenging benchmarks, i.e., COCO, CityScapes, and YouTube-VIS to evaluate the effectiveness of QueryInst in instance segmentation and video instance segmentation (VIS) task. Specifically, using ResNet-101-FPN backbone, QueryInst obtains 48.1 box AP and 42.8 mask AP on COCO test-dev, which is 2 points higher than HTC in terms of both box AP and mask AP, while runs 2.4 times faster. For video instance segmentation, QueryInst achieves the best performance among all online VIS approaches and strikes a decent speed-accuracy trade-off. Code is available at \url{https://github.com/hustvl/QueryInst}.

研究动机与目标

为解决将基于查询的目标检测框架适配到实例分割任务中时面临的效率与一致性挑战。
消除对显式多阶段掩码头部连接的需求，并减少实例分割中的提议分布不一致性。
通过在各阶段对对象查询实施并行监督，实现端到端训练并动态优化掩码头部。
在静态图像和视频实例分割任务中均实现优异性能，同时具备有利的速度-精度权衡。

提出的方法

利用每个阶段内对象查询与掩码 RoI 特征之间固有的单对一对应关系，实现并行监督。
通过在整个网络中保持一致的对象查询表示，实现阶段间的并行监督。
使用独立监督各阶段的动态掩码头部，避免对前一阶段提议的依赖。
消除显式的多阶段掩码头部连接，降低复杂度并提升训练稳定性。
采用基于查询的架构，使每个对象查询在统一框架中同时预测边界框和掩码。
在各阶段使用相同的查询表示，以保持一致性并支持端到端优化。

实验结果

研究问题

RQ1基于查询的实例分割能否在保持高推理速度的同时实现最先进性能？
RQ2阶段间的并行监督在多大程度上改善了掩码头部训练并减少了提议不一致性？
RQ3对象查询与掩码特征之间的一对一对应关系在多大程度上提升了分割精度？
RQ4QueryInst 在静态图像和视频实例分割基准中与现有方法相比表现如何？

主要发现

使用 ResNet-101-FPN 主干网络，QueryInst 在 COCO 测试开发集上实现了 48.1 的框 AP 和 42.8 的掩码 AP，两项指标均比 HTC 提高了 2.0 个 AP 点。
该方法在保持更高精度的同时，推理速度比 HTC 快 2.4 倍，展现出优异的速度-精度权衡。
在视频实例分割（VIS）任务中，QueryInst 在 YouTube-VIS 基准上成为在线 VIS 方法中的最佳表现者。
消除显式的多阶段掩码头部连接显著降低了训练复杂度，并提升了特征一致性。
并行监督机制有效支持了各阶段动态掩码头部的稳定且高效的训练。
对象查询与掩码 RoI 特征之间的一对一对应关系增强了特征对齐性，提升了分割质量。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。