Skip to main content
QUICK REVIEW

[论文解读] ReCo: Retrieve and Co-segment for Zero-shot Transfer

Gyungin Shin, Weidi Xie|arXiv (Cornell University)|Jun 14, 2022
Multimodal Machine Learning Applications被引用 29
一句话总结

ReCo 使用基于 CLIP 的图像检索来整理特定概念的图像档案,然后在档案中进行共分割以创建开放词汇的分割器,能够在没有像素标签的情况下执行零-shot 分割,并可选进行无监督自适应(ReCo+)。

ABSTRACT

Semantic segmentation has a broad range of applications, but its real-world impact has been significantly limited by the prohibitive annotation costs necessary to enable deployment. Segmentation methods that forgo supervision can side-step these costs, but exhibit the inconvenient requirement to provide labelled examples from the target distribution to assign concept names to predictions. An alternative line of work in language-image pre-training has recently demonstrated the potential to produce models that can both assign names across large vocabularies of concepts and enable zero-shot transfer for classification, but do not demonstrate commensurate segmentation abilities. In this work, we strive to achieve a synthesis of these two approaches that combines their strengths. We leverage the retrieval abilities of one such language-image pre-trained model, CLIP, to dynamically curate training sets from unlabelled images for arbitrary collections of concept names, and leverage the robust correspondences offered by modern image representations to co-segment entities among the resulting collections. The synthetic segment collections are then employed to construct a segmentation model (without requiring pixel labels) whose knowledge of concepts is inherited from the scalable pre-training process of CLIP. We demonstrate that our approach, termed Retrieve and Co-segment (ReCo) performs favourably to unsupervised segmentation approaches while inheriting the convenience of nameable predictions and zero-shot transfer. We also demonstrate ReCo's ability to generate specialist segmenters for extremely rare objects.

研究动机与目标

  • 解决语义分割中高昂的标注成本与灵活性受限的问题。
  • 实现开放词汇、零样本分割,无需像素级标签。
  • 利用检索与共分割来继承 CLIP 的大词汇量和零样本能力。

提出的方法

  • 通过使用 CLIP 对文本查询检索最近邻,整理概念特定的图像档案。
  • 在档案中基于种子进行共分割,使用密集特征来获得该概念的参考嵌入。
  • 通过 DenseCLIP 显著性引导以及 Hadamard 乘积将 PNew 与显著性图整合来在新图像中细化分割;可选进行 CRF 后处理。
  • 通过语言引导的筛选和上下文消除来增强共分割,以抑制干扰项。
  • 可选通过在目标分布上使用 ReCo 生成的伪标签训练分割模型(如 DeepLabv3+)来扩展到 ReCo+。

实验结果

研究问题

  • RQ1是否可以通过将检索式示例策划与共分割相结合,在不使用像素级监督的情况下实现开放词汇分割?
  • RQ2DenseCLIP 推理和语言引导的共分割是否比基线无监督方法提升零样本分割质量?
  • RQ3该方法是否能够对标准基准中未出现的罕见或新颖概念进行分割?
  • RQ4在有目标分布数据时,无监督自适应(ReCo+)是否能带来进一步提升?

主要发现

  • ReCo 在标准基准上进行零样本迁移时优于此前的无监督分割方法。
  • 在推理阶段引入 DenseCLIP 可显著提升分割质量。
  • 语言引导的共分割和上下文消除进一步提升性能。
  • ReCo+ 在无监督自适应下取得强劲结果,特别是在 Cityscapes 和 KITTI-STEP 上。
  • ReCo 展示了对罕见概念(如灭火器)甚至罕见物品(安提基特拉仪器)进行共分割的能力。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。