QUICK REVIEW

[論文レビュー] SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition

Zhixuan Lin, Yifu Wu|arXiv (Cornell University)|Jan 8, 2020

Advanced Image and Video Retrieval Techniques参考文献 25被引用数 44

ひとこと要約

SPACE は空間的注意とシーン混合を確率的モデルで組み合わせ、前景オブジェクトと複雑な背景を同時に分解する並列前景処理によるスケーラブルな非教師付きオブジェクト中心のシーン表現を提供し、Atari と 3D-Rooms で SPAIR、IODINE、GENESIS と比較して評価されている。

ABSTRACT

The ability to decompose complex multi-object scenes into meaningful abstractions like objects is fundamental to achieve higher-level cognition. Previous approaches for unsupervised object-oriented scene representation learning are either based on spatial-attention or scene-mixture approaches and limited in scalability which is a main obstacle towards modeling real-world scenes. In this paper, we propose a generative latent variable model, called SPACE, that provides a unified probabilistic modeling framework that combines the best of spatial-attention and scene-mixture approaches. SPACE can explicitly provide factorized object representations for foreground objects while also decomposing background segments of complex morphology. Previous models are good at either of these, but not both. SPACE also resolves the scalability problems of previous methods by incorporating parallel spatial-attention and thus is applicable to scenes with a large number of objects without performance degradations. We show through experiments on Atari and 3D-Rooms that SPACE achieves the above properties consistently in comparison to SPAIR, IODINE, and GENESIS. Results of our experiments can be found on our project website: https://sites.google.com/view/space-project-page

研究の動機と目的

Occlusion を伴う多-object シーンと複雑な背景の構造化された表現の学習を、非教師付きで促進する。
Space を提案し、確率潜在変数フレームワーク内で空間的注意とシーン混合のアプローチを統合する。
背景と前景を分解しつつ、オブジェクト表現を分離したまま、スケーラビリティを確保するための前景オブジェクトの並列処理を実現する。

提案手法

グリッドセルごとに z_where、z_0pt、z_pres、z_what を生成する並列空間注意を用いた前景モジュールを導入する。
スペーシャル・トランスフォーマを使用して各前景オブジェクトをキャンバス上に並列にレンダリングする。
各混合成分を latent z^m（混合）と z^c（色）で表現する K 成分ピクセル-wise な混合モデルを背景として採用し、VAE によってデコードする。
前景と背景の双方を共同で考慮する変分目的関数（ELBO）で訓練し、セル潜在変数には平均場近似を用いる。
オブジェクトマスクが glimpse 境界に接触しないように抑制する補助的境界損失によってボックス分割を防止する。
前景処理を並列化することでスケーラビリティを示し、SPAIR、IODINE、GENESIS の逐次的推論と対比する。

実験結果

リサーチクエスチョン

RQ1SPACE は複雑な背景成分を分解しつつ、明示的なオブジェクト中心の前景表現を提供できるか。
RQ2並列前景処理は前景検出の品質を犠牲にすることなく、スケーラビリティと速度を改善するか。
RQ3Space は Atari および 3D-Room データセットにおける収束性、速度、境界ボックス品質の面で SPAIR、IODINE、GENESIS とどう比較されるか。

主な発見

モデル	データセット	Avg. Precision (IoU=0.5)	Avg. Precision (IoU 0.5:0.95)	Object Count Error Rate
SPACE (16×16)	3D-Room Large	0.8927 ± 0.0027	0.4445 ± 0.0075	0.0446 ± 0.0026
SPAIR (16×16)	3D-Room Large	0.9072 ± 0.0003	0.4364 ± 0.0179	0.0360 ± 0.0072
SPACE (8×8)	3D-Room Small	0.9027 ± 0.0009	0.5069 ± 0.0030	0.0397 ± 0.0026
SPAIR (8×8)	3D-Room Small	0.9081 ± 0.0004	0.5068 ± 0.0081	0.0209 ± 0.0039

SPACE は SPAIR と同等の境界ボックス品質を保ちつつ、勾配ステップの待機時間と訓練収束のオーダーオブマグニチュードの高速化を実現する。
SPACE は多数の前景オブジェクトに対して並列前景処理を用いることで、性能の大幅な低下なしにスケールする。
SPACE は各オブジェクトの位置・スケールといった個別の性質を伴う前景オブジェクトを明示的に分離して提供し、背景成分を分解して、3D-Room および Atari における定性的分析でベースラインを上回る。
定量的な結果は、3D-Room Large 設定で平均精度とオブジェクト数誤差の点で SPACE が SPAIR に対して競争力を示し、収束と並列レンダリングの点でより速い。
背景: SPACE の背景は複数の成分に分解され、背景を単一の blob として扱うモデルよりも複雑な形態のモデリングが向上する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。