QUICK REVIEW

[论文解读] Intermediate Prototype Mining Transformer for Few-Shot Semantic Segmentation

Yuanwei Liu, Nian Liu|arXiv (Cornell University)|Oct 13, 2022

Advanced Neural Network Applications被引用 37

一句话总结

引入一个中间原型挖掘 Transformer（IPMT），用于学习结合支持确定性和查询自适应性的中间原型，迭代地完善查询特征以提升小样本语义分割。

ABSTRACT

Few-shot semantic segmentation aims to segment the target objects in query under the condition of a few annotated support images. Most previous works strive to mine more effective category information from the support to match with the corresponding objects in query. However, they all ignored the category information gap between query and support images. If the objects in them show large intra-class diversity, forcibly migrating the category information from the support to the query is ineffective. To solve this problem, we are the first to introduce an intermediate prototype for mining both deterministic category information from the support and adaptive category knowledge from the query. Specifically, we design an Intermediate Prototype Mining Transformer (IPMT) to learn the prototype in an iterative way. In each IPMT layer, we propagate the object information in both support and query features to the prototype and then use it to activate the query feature map. By conducting this process iteratively, both the intermediate prototype and the query feature can be progressively improved. At last, the final query feature is used to yield precise segmentation prediction. Extensive experiments on both PASCAL-5i and COCO-20i datasets clearly verify the effectiveness of our IPMT and show that it outperforms previous state-of-the-art methods by a large margin. Code is available at https://github.com/LIUYUANWEI98/IPMT

研究动机与目标

解决少样本语义分割（FSS）中支持与查询之间的同类内差异。
提出一个中间原型以弥合支持与查询图像之间的类别信息差距。
开发一个迭代的 IPMT 框架，逐步完善中间原型和查询特征以实现准确分割。
在 PASCAL-5i 和 COCO-20i 基准上展示最新技术水平。
提供关于中间原型如何降低支持与查询原型之间同类内距离的见解。

提出的方法

定义一个中间原型挖掘 Transformer（IPMT），每层包含两步：中间原型挖掘（IPM）和查询激活（QA）。
IPM 通过对支持和查询特征应用带掩码的交叉注意力来学习中间原型 G，在支持掩码和查询预测的引导下进行。
QA 使用从已学得的原型 G 跨连接和一个激活网络来激活查询特征图 Fq，必要时进行可变形自注意力以进行上下文聚合。
使用带掩码的注意力机制，确保原型更新聚焦于目标区域，利用真值支持掩码 Ms 和查询预测掩码 Pq。
用双分割损失（Ldsl）对 G 在支持和查询图像上生成的掩码进行监督训练。
采用带有 L IPMT 层的迭代方案，逐步 refining G、Fq 和 Pq（Gl、Fql、Pql），以获得更好的最终分割结果。

实验结果

研究问题

RQ1中间原型是否能够缓解 FSS 中支持与查询之间的信息差距？
RQ2中间原型和查询特征的迭代细化是否带来分割性能提升？
RQ3将确定性支持信息与自适应查询知识相结合如何影响原型质量和分割精度？
RQ4DSL 与 QA 组件对 IPMT 的整体性能有何影响？
RQ5与以往最先进方法相比，IPMT 在标准 FSS 基准（PASCAL-5i 和 COCO-20i）上的表现如何？

主要发现

IPMT 在 PASCAL-5i 与 COCO-20i 的 1-shot 和 5-shot 设置中均优于现有方法，使用 ResNet 主干。
从支持和查询上下文衍生的中间原型 G 靠近查询原型，较少与支持原型的同类内差异。
迭代的 IPMT 层逐步提升原型质量和分割结果，五层获得显著增益。
双分割损失（DSL）和查询激活（QA）对性能贡献显著，删除它们会降低结果。
消融研究显示在 IPM 中同时使用支持和查询信息以及迭代带来收益，验证了设计选择。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。