QUICK REVIEW

[论文解读] Hybrid Task Cascade for Instance Segmentation

Kai Chen, Jiangmiao Pang|arXiv (Cornell University)|Jan 22, 2019

Advanced Neural Network Applications参考文献 43被引用 175

一句话总结

HTC 在多阶段中交错检测与分割，新增掩码特征流与语义上下文分支，提升 COCO 的 mask AP。

ABSTRACT

Cascade is a classic yet powerful architecture that has boosted performance on various tasks. However, how to introduce cascade to instance segmentation remains an open question. A simple combination of Cascade R-CNN and Mask R-CNN only brings limited gain. In exploring a more effective approach, we find that the key to a successful instance segmentation cascade is to fully leverage the reciprocal relationship between detection and segmentation. In this work, we propose a new framework, Hybrid Task Cascade (HTC), which differs in two important aspects: (1) instead of performing cascaded refinement on these two tasks separately, it interweaves them for a joint multi-stage processing; (2) it adopts a fully convolutional branch to provide spatial context, which can help distinguishing hard foreground from cluttered background. Overall, this framework can learn more discriminative features progressively while integrating complementary features together in each stage. Without bells and whistles, a single HTC obtains 38.4 and 1.5 improvement over a strong Cascade Mask R-CNN baseline on MSCOCO dataset. Moreover, our overall system achieves 48.6 mask AP on the test-challenge split, ranking 1st in the COCO 2018 Challenge Object Detection Task. Code is available at: https://github.com/open-mmlab/mmdetection.

研究动机与目标

通过在级联中利用任务之间的强信息流推动实例分割的改进。
提出 Hybrid Task Cascade (HTC)，在每个阶段交错检测与分割。
研究掩码信息流和来自语义分支的空间上下文的益处。
展示端到端可训练性以及在 COCO test-dev/test-challenge 上的最先进性能。

提出的方法

提出一个三阶段级联，在一个联合多任务管线中逐步对 bbox 回归和掩码预测进行精细化。
在各阶段的掩码分支之间添加直接连接，以实现掩码信息流。
加入一个全卷积的语义分割分支，提供空间上下文并将其特征与盒子/掩码分支融合。
通过 RoIAlign 将语义特征与 ROI 特征融合，以改进 bbox 和掩码预测。
使用跨阶段与跨任务的多任务损失进行训练，带有平衡系数 alpha_t 和 beta。
可选地扩展骨干网和训练技巧（DCN、SyncBN、多尺度、集成）以获得进一步提升。

实验结果

研究问题

RQ1分层的多任务架构是否可以同时改善实例分割中的边框和掩码预测？
RQ2跨阶段显式的掩码信息流是否增强了掩码的细化？
RQ3添加空间上下文的语义分割分支是否改善前景/背景的区分？
RQ4这些设计选择如何影响 COCO 的 mask AP 以及在 test-dev/test-challenge 上的整体性能？

主要发现

方法	骨干网络	box AP	mask AP	AP50	AP75	AP_S	AP_M	AP_L	运行时（fps）
Mask R-CNN	ResNet-50-FPN	39.1	35.6	57.6	38.1	18.7	38.3	46.6	5.3
Cascade Mask R-CNN	ResNet-50-FPN	42.7	36.9	58.6	39.7	19.6	39.3	48.8	3.0
HTC (ours)	ResNet-50-FPN	43.6	38.4	60.0	41.5	20.4	40.7	51.2	2.5
HTC (ours)	ResNet-101-FPN	45.3	39.7	61.8	43.1	21.0	42.2	53.5	2.4
HTC (ours)	ResNeXt-101-FPN	47.1	41.2	63.9	44.7	22.8	43.9	54.6	2.1

HTC 在不同骨干网络上相较基线的 Mask R-CNN 和级联 Mask R-CNN，获得更高的 mask AP。
使用 ResNet-50-FPN、ResNet-101-FPN、和 ResNeXt-101-FPN 的 HTC 始终比基线在 mask AP 上提升最多约 1.5 点。
交错执行带来适度的增益；掩码信息流进一步提升（约 0.6–1.5 AP）。
语义分割分支提供互补的上下文，贡献额外的增益（约 0.6 AP）。
在 COCO test-dev 上，结合强骨干网络及相关技巧的 HTC 达到 49.0 的 mask AP；在 test-challenge 上，48.6 的 mask AP。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。