QUICK REVIEW

[论文解读] Object-Centric Learning with Slot Attention

Francesco Locatello, Dirk Weissenborn|arXiv (Cornell University)|Jun 26, 2020

Multimodal Machine Learning Applications参考文献 89被引用 218

一句话总结

本文提出 Slot Attention，一种迭代注意力模块，将 CNN 感知特征转换为一组可互换的插槽，再绑定到对象上，使无监督对象发现和监督的基于集合的属性预测成为可能。

ABSTRACT

Learning object-centric representations of complex scenes is a promising step towards enabling efficient abstract reasoning from low-level perceptual features. Yet, most deep learning approaches learn distributed representations that do not capture the compositional properties of natural scenes. In this paper, we present the Slot Attention module, an architectural component that interfaces with perceptual representations such as the output of a convolutional neural network and produces a set of task-dependent abstract representations which we call slots. These slots are exchangeable and can bind to any object in the input by specializing through a competitive procedure over multiple rounds of attention. We empirically demonstrate that Slot Attention can extract object-centric representations that enable generalization to unseen compositions when trained on unsupervised object discovery and supervised property prediction tasks.

研究动机与目标

激励学习面向对象的表示，以提高视觉场景理解的样本效率和泛化能力。
引入 Slot Attention，作为感知编码器与一组插槽之间的可微分接口。
展示具有竞争力的无监督对象发现与训练效率的提升。
展示监督的集合预测，其中插槽对应对象并可预测属性。
讨论对未见对象组合和对象数量的泛化。

提出的方法

提出 Slot Attention 模块，通过迭代注意力和一个共享的基于 GRU 的更新，将 N 个输入特征向量映射到 K 个插槽。
使用点积注意力，对插槽进行归一化，促使插槽之间竞争以解释输入的各部分。
每次迭代后用 GRU 更新插槽，若有残差的 MLP，配合 LayerNorm 以实现稳定训练。
通过从可学习高斯分布采样来初始化插槽，使测试时插槽数量具有变异性。
将该模块应用为：(i) 无监督对象发现的编解码器-解码器，以及 (ii) 用于对象属性集合预测的编码器。

实验结果

研究问题

RQ1Slot Attention 是否能够在无监督的情况下从感知输入中提取面向对象的表示？
RQ2Slot Attention 是否能够在多对象数据集上实现准确的无监督对象发现？
RQ3学习到的插槽是否能支持对对象集合的监督属性预测？
RQ4在测试时插槽数量增多或超出训练时的插槽数量，Slot Attention 如何泛化？

主要发现

数据集	Slot Attention	IODINE	MONet	Slot MLP
CLEVR6	98.8±0.3	98.8±0.0	96.2±0.6	60.4±6.6
Multi-dSprites	91.3±0.3	76.7±5.6	90.4±0.8	60.3±1.8
Tetrominoes	99.5±0.2	99.2±0.4	—	25.1±34.3

Slot Attention 在 CLEVR6、Multi-dSprites 和 Tetrominoes 上的 ARI 分数与最先进的无监督对象发现方法相比具有竞争力甚至更优。
在 CLEVR6，ARI=98.8±0.3；Multi-dSprites ARI=91.3±0.3；Tetrominoes ARI=99.5±0.2（去除一个异常值）。
与 IODINE 和 MONet 相比，Slot Attention 的内存效率更高，训练更快。
在 CLEVR10 的集合预测任务中，Slot Attention 在平均精度方面与 DSPN 基线相当或更优，并且在测试阶段迭代次数增加时仍有提升。
Slot Attention 产生的注意掩码在没有直接分割监督的情况下也能对对象进行语义分割。
当测试时的插槽数量超过训练时的插槽数量，方法仍保持出色的性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。