QUICK REVIEW

[논문 리뷰] When Slots Compete: Slot Merging in Object-Centric Learning

Christos Chatzisavvas, Pavlos Rigas|arXiv (Cornell University)|2026. 03. 11.

Domain Adaptation and Few-Shot Learning인용 수 0

한 줄 요약

본 연구는 겹치는 Slot Attention 슬롯을 통합하는 미분가능한 슬롯 병합 연산자를 도입하고 이를 DINOSAUR에 통합하여 단편화를 줄이고 객체 인지 표현 및 세분화를 향상시킨다.

ABSTRACT

Slot-based object-centric learning represents an image as a set of latent slots with a decoder that combines them into an image or features. The decoder specifies how slots are combined into an output, but the slot set is typically fixed: the number of slots is chosen upfront and slots are only refined. This can lead to multiple slots competing for overlapping regions of the same entity rather than focusing on distinct regions. We introduce slot merging: a drop-in, lightweight operation on the slot set that merges overlapping slots during training. We quantify overlap with a Soft-IoU score between slot-attention maps and combine selected pairs via a barycentric update that preserves gradient flow. Merging follows a fixed policy, with the decision threshold inferred from overlap statistics, requiring no additional learnable modules. Integrated into the established feature-reconstruction pipeline of DINOSAUR, the proposed method improves object factorization and mask quality, surpassing other adaptive methods in object discovery and segmentation benchmarks.

연구 동기 및 목표

감독 없이도 장면을 이산적인 객체로 분해하도록 객체 중심 학습을 촉진한다.
고정된 슬롯 수로 인한 단편화를 해결하기 위해 겹치는 슬롯의 병합을 가능하게 한다.
학습 중 슬롯 표현을 정제하는 경량의 미분가능한 메커니즘을 제공한다.
DINOSAUR 프레임워크에 병합 메커니즘을 통합하고 표준 벤치마크에서 평가한다.

제안 방법

확률적 Soft-IoU 점수를 사용하여 슬롯 어텐션 맵 간의 공간적 중첩을 정량화한다.
슬롯 표현의 질량 가중 보삼각 보간을 수행하는 미분가능한 슬롯 병합 연산자를 도입한다.
겹침이 가장 큰 쌍을 선택하고 데이터 기반 임계치에 도달할 때까지 병합하는 고정 병합 정책을 적용한다.
병합 중 질량 보존과 그래디언트 흐름을 유지하기 위해 슬롯 어텐션을 집계하여 어텐션 맵을 업데이트한다.
중첩 통계에서 얻은 데이터 기반 임계값으로 제어되는 슬롯 표현이 안정화된 후 병합을 활성화한다.
VOC, COCO, MOVi-C, MOVi-E 데이터셋에서 DINOSAUR 프레임워크 내에서 평가한다.

Figure 1 : We introduce a merge operator over the slot set that adaptively refines factorization, producing coherent object-level representations.

실험 결과

연구 질문

RQ1겹치거나 경쟁하는 슬롯을 하드 프루닝 없이 단일 일관된 표현으로 병합할 수 있는가?
RQ2학습 중 슬롯 병합을 통합하는 것이 추론 시에만 병합하는 것보다 더 나은 객체 인자화 및 세분화를 얻는가?
RQ3Soft-IoU를 기반으로 한 병합 정책이 후속 재구성/세분화 성능에 어떤 영향을 미치는가?
RQ4병합 연산을 통한 미분 가능성과 그래디언트 흐름이 슬롯 최적화에 미치는 영향은 무엇인가?

주요 결과

제안된 병합 메커니즘은 현실 세계 및 합성 벤치마크에서 객체 표현과 세분화 품질을 일관되게 향상시킨다.
학습 중 병합이 추론 전용 병합보다 더 나은 mBO 및 mIoU 점수를 산출한다.
병합 계층을 통해 그래디언트가 역전파되는 것이 성능에 유익하다.
병합 중 어텐션 맵 집계가 세분화 지표를 더 향상시킨다.
장면의 복잡도에 따라 병합 빈도가 적응하며, 밀집한 장면에서 더 많은 병합이, 희박한 장면에서 더 적은 병합이 발생한다.

Figure 2 : Illustration of the proposed pipeline.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.