QUICK REVIEW

[논문 리뷰] Semantic-SAM: Segment and Recognize Anything at Any Granularity

Feng Li, Hao Zhang|arXiv (Cornell University)|2023. 07. 10.

Advanced Neural Network Applications인용 수 51

한 줄 요약

Semantic-SAM은 의미 인식 출력을 갖춘 보편적 분할 모델로, 의미론적 인식과 다중 세분화 수준에서 객체를 분할하고 인식할 수 있으며, 오픈-vocabulary와 다중-세분화를 가능하게 하기 위해 일곱 데이터셋에서 함께 학습된 모델이다. 이는 many-to-many 매칭과 분리된 객체/부품 분류를 통해 의미-인식과 세분화의 풍부함을 달성한다.

ABSTRACT

In this paper, we introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity. Our model offers two key advantages: semantic-awareness and granularity-abundance. To achieve semantic-awareness, we consolidate multiple datasets across three granularities and introduce decoupled classification for objects and parts. This allows our model to capture rich semantic information. For the multi-granularity capability, we propose a multi-choice learning scheme during training, enabling each click to generate masks at multiple levels that correspond to multiple ground-truth masks. Notably, this work represents the first attempt to jointly train a model on SA-1B, generic, and part segmentation datasets. Experimental results and visualizations demonstrate that our model successfully achieves semantic-awareness and granularity-abundance. Furthermore, combining SA-1B training with other segmentation tasks, such as panoptic and part segmentation, leads to performance improvements. We will provide code and a demo for further exploration and evaluation.

연구 동기 및 목표

의미 인식과 세분화 풍부함을 갖춘 보편적 분할 모델 구축 목표.
여러 데이터셋에서 의미론적 수준과 세분화 수준 간의 학습 데이터를 통합.
단일 클릭으로 다중 세분화 출력을 가능하게 하는 다대다 매칭 체계 도입.
사물과 부품 개념을 분리하여 부품 지식을 사물 간 전이 가능하도록 함.
SA-1B와 함께한 공동 학습을 통한 팬오픽 및 부품 분할의 개선 시연.

제안 방법

다중-세분화 마스크를 생성하는 질의 기반 마스크 디코더 사용.
다양한 세분화 수준에 해당하는 여러 쿼리(K=6)로 각 사용자의 클릭을 표현.
점/상자 프롬프트를 앵커 박스로 변환하고 위치 임베딩을 가진 변형 가능한 디코더에 피드.
다수의 예측 마스크를 클릭당 다수의 그라운드트루스 마스크와 정렬하기 위해 다대다 헝가리안 매칭 사용.
공유 텍스트 인코더를 사용하여 객체-부품 분류를 분리함으로써 데이터셋 간의 공동 객체/부품 분할 가능성 부여.
일곱 데이터셋(SA-1B, COCO panoptic, ADE20k panoptic, Pascal Part, PACO, PartImageNet, Objects365)에서 학습하고 목표에 맞춰 데이터 형식을 재구성.

실험 결과

연구 질문

RQ1하나의 모델이 오픈 벨로캐니를 가진 다양한 데이터셋에서 다중 세분화 수준으로 분할하고 인식할 수 있는가?
RQ2의미론적 데이터와 세분화가 풍부한 데이터를 함께 학습하면 일반 분할과 상세 부품 분할 모두가 개선되는가?
RQ3다대다 매칭 전략이 한 번의 클릭으로 나오는 다중 세분화 출력을 개선하는가?
RQ4분리된 객체/부품 분류가 부품 개념의 지식을 객체 간에 효과적으로 전이하도록 할 수 있는가?
RQ5SA-1B 및 기타 분할 데이터가 팬오픽 및 부품 분할 작업에 어떤 영향을 미치는가?

주요 결과

Semantic-SAM은 일곱 데이터셋의 공동 학습으로 의미 인식과 세분화의 풍부함을 달성한다.
SA-1B를 COCO 팬오픽 및 기타 데이터와 함께 사용하면 인터랙티브 분할에서 상자 AP(+2.3)와 마스크 AP(+1.2)의 개선이 나타난다.
클릭당 다중-세분화 출력은 SAM과 같은 기존 방법보다 더 풍부하고 품질이 높으며, 1-IoU@All Granularity가 더 우수하다.
다대다 매칭은 다수-단일 매칭에 비해 1-IoU@All Granularity 점수를 크게 향상시킨다.
SA-1B 데이터로 학습하면 COCO 평가에서 작은 객체의 성능(APs, APm)이 특히 향상된다.
Semantic-SAM은 일반 분할과 부품 분할 작업 전반에서 오픈-어휘성 및 다중-세분화 가능성을 보여준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.