[論文レビュー] When Slots Compete: Slot Merging in Object-Centric Learning
The paper introduces a differentiable slot-merging operator that consolidates overlapping Slot Attention slots, integrated into DINOSAUR, to reduce fragmentation and improve object-aware representations and segmentation.
Slot-based object-centric learning represents an image as a set of latent slots with a decoder that combines them into an image or features. The decoder specifies how slots are combined into an output, but the slot set is typically fixed: the number of slots is chosen upfront and slots are only refined. This can lead to multiple slots competing for overlapping regions of the same entity rather than focusing on distinct regions. We introduce slot merging: a drop-in, lightweight operation on the slot set that merges overlapping slots during training. We quantify overlap with a Soft-IoU score between slot-attention maps and combine selected pairs via a barycentric update that preserves gradient flow. Merging follows a fixed policy, with the decision threshold inferred from overlap statistics, requiring no additional learnable modules. Integrated into the established feature-reconstruction pipeline of DINOSAUR, the proposed method improves object factorization and mask quality, surpassing other adaptive methods in object discovery and segmentation benchmarks.
研究の動機と目的
- Motivate object-centric learning to decompose scenes into discrete objects without supervision.
- Address slot fragmentation due to a fixed number of slots by enabling merging of overlapping slots.
- Provide a lightweight, differentiable mechanism that refines slot representations during training.
- Integrate the merging mechanism into the DINOSAUR framework and evaluate on standard benchmarks.
提案手法
- Quantify spatial overlap between slot attention maps using a probabilistic Soft-IoU score.
- Introduce a differentiable slot merge operator that performs mass-weighted barycentric interpolation of slot representations.
- Apply a fixed merge policy that selects pairs with highest overlap and merges them until a data-driven threshold is reached.
- Update attention maps during merging by aggregating slot attentions to preserve mass and gradient flow.
- Activate merging after slot representations stabilize, controlled by a data-driven threshold derived from overlap statistics.
- Evaluate within the DINOSAUR framework on VOC, COCO, MOVi-C, and MOVi-E datasets.

実験結果
リサーチクエスチョン
- RQ1Can overlapping (competing) slots be merged into a single coherent representation without hard pruning?
- RQ2Does integrating slot merging during training yield better object factorization and segmentation than merging only at inference?
- RQ3How does the merge policy, based on Soft-IoU, influence downstream reconstruction/segmentation performance?
- RQ4What is the impact of differentiability and gradient flow through the merging operation on slot optimization?
主な発見
- The proposed merging mechanism consistently improves object representations and segmentation quality across real-world and synthetic benchmarks.
- Merging during training outperforms inference-only merging, yielding higher mBO and mIoU scores.
- Allowing gradients to backpropagate through the merging layer is beneficial for performance.
- Attention map aggregation during merging further boosts segmentation metrics.
- Merging frequency adapts to scene complexity, with more merges in denser scenes and fewer in sparse ones.

より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。