QUICK REVIEW

[논문 리뷰] Object-Centric Learning with Slot Attention

Francesco Locatello, Dirk Weissenborn|arXiv (Cornell University)|2020. 06. 26.

Multimodal Machine Learning Applications참고 문헌 89인용 수 218

한 줄 요약

본 논문은 CNN 지각 특징을 객체에 바인딩될 수 있는 교환 가능한 슬롯 집합으로 변환하는 반복적 주의 모듈 Slot Attention을 제안하며, 이를 통해 비지도 객체 발견과 지도된 집합 기반 속성 예측을 가능하게 한다.

ABSTRACT

Learning object-centric representations of complex scenes is a promising step towards enabling efficient abstract reasoning from low-level perceptual features. Yet, most deep learning approaches learn distributed representations that do not capture the compositional properties of natural scenes. In this paper, we present the Slot Attention module, an architectural component that interfaces with perceptual representations such as the output of a convolutional neural network and produces a set of task-dependent abstract representations which we call slots. These slots are exchangeable and can bind to any object in the input by specializing through a competitive procedure over multiple rounds of attention. We empirically demonstrate that Slot Attention can extract object-centric representations that enable generalization to unseen compositions when trained on unsupervised object discovery and supervised property prediction tasks.

연구 동기 및 목표

학습 데이터 샘플 효율성 및 일반화 향상을 위한 객체 중심 표현 학습 동기 부여.
perceptual 인코더와 슬롯 세트 사이의 차별가능한 인터페이스로 Slot Attention 도입.
경쟁력 있는 성능과 향상된 학습 효율성을 보이는 비지도 객체 발견 시연.
객체에 대응하는 슬롯으로 속성을 예측할 수 있는 지도된 집합 예측 시연.
미지의 객체 구성 및 객체 수에 대한 일반화 논의.

제안 방법

입력 벡터 N개를 반복적 주의와 공유 GRU 기반 업데이트를 통해 K개의 슬롯으로 매핑하는 Slot Attention 모듈 제시.
슬롯 간 소속 경쟁을 만들기 위해 슬롯에 대한 정규화를 통한 점곱 주의 사용.
매 반복 후 GRU와 선택적 잔차 MLP로 슬롯 업데이트, 안정적 학습을 위한 LayerNorm 적용.
학습 가능한 가우시안에서 샘플링하여 테스트 시 슬롯 수의 가변성을 가능하게 초기화.
(i) 비지도 객체 발견 인코더–디코더로, (ii) 객체 속성 예측을 위한 집합 예측 인코더로 모듈 적용.

실험 결과

연구 질문

RQ1Slot Attention이 감독 없이도 지각 입력으로부터 객체 중심 표현을 추출할 수 있는가?
RQ2Slot Attention가 여러 객체 데이터셋에서 정확한 비지도 객체 발견을 가능하게 하는가?
RQ3학습된 슬롯이 객체 집합의 속성 예측을 위한 감독된 작업을 지원하는가?
RQ4테스트 시 슬롯 수가 학습 시보다 더 많아져도 Slot Attention은 일반화되는가?

주요 결과

Slot Attention은 CLEVR6, Multi-dSprites, Tetrominoes에서 비지도 객체 발견 방법의 최신 결과와 경쟁하거나 우수한 ARI 점수를 달성한다.
CLEVR6에서 ARI=98.8±0.3; Multi-dSprites ARI=91.3±0.3; Tetrominoes ARI=99.5±0.2 (이상치 제외 시).
IODINE 및 MONet과 비교할 때 Slot Attention은 메모리 효율이 높고 학습 속도가 더 빠르다.
CLEVR10에서 집합 예측을 위한 평균 정밀도에서 DSPN 기준선과 동등 혹은 우수하며, 테스트 시 다수의 반복에서 확장된다.
Slot Attention이 생성하는 주의 마스크는 직접적인 세그멘테이션 감독 없이도 의미적으로 객체를 분류할 수 있다.
테스트 시 슬롯 수가 학습 시보다 많아져도 방법은 강력한 성능을 유지한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.