QUICK REVIEW

[논문 리뷰] ReCo: Retrieve and Co-segment for Zero-shot Transfer

Gyungin Shin, Weidi Xie|arXiv (Cornell University)|2022. 06. 14.

Multimodal Machine Learning Applications인용 수 29

한 줄 요약

tldr: ReCo는 CLIP 기반 이미지 검색으로 개념별 이미지 아카이브를 선별한 다음, 아카이브 전역에서 공동 분할을 수행해 픽셀 라벨 없이 제로샷 세그먼테이션을 수행하는 오픈 어휘 세그먼터를 생성하고, 필요에 따라 비지도 적응(ReCo+)을 제공합니다.

ABSTRACT

Semantic segmentation has a broad range of applications, but its real-world impact has been significantly limited by the prohibitive annotation costs necessary to enable deployment. Segmentation methods that forgo supervision can side-step these costs, but exhibit the inconvenient requirement to provide labelled examples from the target distribution to assign concept names to predictions. An alternative line of work in language-image pre-training has recently demonstrated the potential to produce models that can both assign names across large vocabularies of concepts and enable zero-shot transfer for classification, but do not demonstrate commensurate segmentation abilities. In this work, we strive to achieve a synthesis of these two approaches that combines their strengths. We leverage the retrieval abilities of one such language-image pre-trained model, CLIP, to dynamically curate training sets from unlabelled images for arbitrary collections of concept names, and leverage the robust correspondences offered by modern image representations to co-segment entities among the resulting collections. The synthetic segment collections are then employed to construct a segmentation model (without requiring pixel labels) whose knowledge of concepts is inherited from the scalable pre-training process of CLIP. We demonstrate that our approach, termed Retrieve and Co-segment (ReCo) performs favourably to unsupervised segmentation approaches while inheriting the convenience of nameable predictions and zero-shot transfer. We also demonstrate ReCo's ability to generate specialist segmenters for extremely rare objects.

연구 동기 및 목표

주석 비용이 높고 의미론적 세그먼테이션의 유연성이 제한되는 문제를 해소한다.
픽셀 수준의 라벨 없이도 오픈 어휘, 제로샷 세그먼테이션을 가능하게 한다.
검색 및 공동 세그먼션을 활용해 CLIP의 큰 어휘와 제로샷 능력을 활용한다.

제안 방법

CLIP를 사용해 텍스트 쿼리의 이웃을 검색하여 개념별 이미지 아카이브를 큐레이션한다.
밀도 특징을 사용해 아카이브 전역에서 시드 기반 공동 세그먼트를 수행하여 해당 개념의 기준 임베딩을 얻는다.
DenseCLIP 주의도 가이드를 이용해 새 이미지의 세그먼트를 정제하고 PNew과 주의도 맵의 Hadamard 곱 기반 통합을 수행한다; 선택적으로 CRF 후처리.
언어 가이드를 통한 필터링과 컨텍스트 제거로 공동 세그멘테이션을 향상시켜 방해 요소를 억제한다.
대상 분포에서 ReCo 생성 의사라벨을 사용해 세그먼테이션 모델(예: DeepLabv3+)을 학습하여 ReCo+로 확장하는 것을 선택적으로 구성한다.

실험 결과

연구 질문

RQ1검색 기반 표본 선별과 공동 세그먼테이션을 결합하여 픽셀 수준의 감독 없이도 오픈 어휘 세그먼테이션을 수행할 수 있는가?
RQ2DenseCLIP 추론과 언어 가이드 공동 세그먼테이션이 기본 무감독 방법에 비해 제로샷 세그먼테이션 품질을 향상시키는가?
RQ3표준 벤치마크에 나타나지 않는 희귀하거나 새로운 개념을 이 방식으로 세그먼트할 수 있는가?
RQ4목표 분포 데이터가 있을 때 무감독 적응(ReCo+)이 추가 이득을 제공하는가?

주요 결과

ReCo는 표준 벤치마크에서 제로샷 전이에 있어 기존의 무감독 세그먼테이션 방법보다 우수하다.
추론 시 DenseCLIP를 도입하면 세그먼테이션 품질에 상당한 이점이 있다.
언어 가이드 공동 세그먼테이션과 컨텍스트 제거가 성능을 추가로 향상시킨다.
무감독 적응(ReCo+) 하에서 Cityscapes와 KITTI-STEP에서 특히 강력한 성과를 달성한다.
ReCo는 희귀 개념(예: 화재 소화기)과 심지어 희귀 유물(Antikythera mechanism)을 공동 세그먼트하는 능력을 시연했다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.