QUICK REVIEW

[論文レビュー] Canonical Capsules: Self-Supervised Capsules in Canonical Pose

Weiwei Sun, Andrea Tagliasacchi|arXiv (Cornell University)|Dec 8, 2020

3D Shape Modeling and Analysis被引用数 36

ひとこと要約

ラベルなしでカノニカルフレームを学習し、意味的に一貫した部位分解を行う自己教師付きの 3D 点群カプセルアーキテクチャを提案し、再構成、正準化、および無監督分類を改善可能にする。

ABSTRACT

We propose a self-supervised capsule architecture for 3D point clouds. We compute capsule decompositions of objects through permutation-equivariant attention, and self-supervise the process by training with pairs of randomly rotated objects. Our key idea is to aggregate the attention masks into semantic keypoints, and use these to supervise a decomposition that satisfies the capsule invariance/equivariance properties. This not only enables the training of a semantically consistent decomposition, but also allows us to learn a canonicalization operation that enables object-centric reasoning. To train our neural network we require neither classification labels nor manually-aligned training datasets. Yet, by learning an object-centric representation in a self-supervised manner, our method outperforms the state-of-the-art on 3D point cloud reconstruction, canonicalization, and unsupervised classification.

研究の動機と目的

Motivate unsupervised learning for 3D point clouds without pre-aligned datasets.
Develop a permutation-equivariant capsule decomposition via attention.
Learn a canonical frame to enable object-centric reasoning.
Train end-to-end with Siamese pairs of randomly rotated objects.
Demonstrate state-of-the-art auto-encoding, canonicalization, and classification in 3D.

提案手法

Compute a K-part decomposition of a point cloud via a permutation-equivariant capsule encoder E.
Aggregate attention masks to obtain K capsule poses theta_k and descriptors beta_k (equations 2).
Regress canonical capsule poses from descriptors using a network K to obtain bar{theta}; enforce locality.
Canonicalize by solving a rigid alignment to align learned canonical keypoints bar{theta} with predicted poses (equation 5).
Decode per-capsule point clouds in the canonical frame and reconstruct the input (equation 4) using Chamfer distance as reconstruction loss (equation 11).
Train with Siamese pairs of randomly rotated/translated shapes to enforce equivariance/invariance (losses L_equivariance, L_invariance, L_equilibrium, L_localization, L_canonical).

実験結果

リサーチクエスチョン

RQ1Can a self-supervised capsule decomposition yield semantically consistent parts in unaligned 3D point clouds?
RQ2Does learning a canonical frame improve object-centric representations for reconstruction and downstream tasks?
RQ3How does the proposed canonicalization interact with transformation invariances/equivariance to enable unsupervised classification?
RQ4What is the impact of the various loss terms on reconstruction quality and canonicalization stability?

主な発見

Method	Airplane (Aligned)	Chair (Aligned)	Multi (Aligned)	Airplane (Unaligned)	Chair (Unaligned)	Multi (Unaligned)
3D-PointCapsNet [64]	1.94	3.30	2.49	5.58	7.57	4.66
AtlasNetV2 [12]	1.28	2.36	2.14	2.80	3.98	3.08
Our method	0.96	1.99	1.76	1.11	2.58	2.22

Achieves state-of-the-art Chamfer-distance-based auto-encoding on aligned and unaligned ShapeNet data (e.g., Our method: 0.96, 1.99, 1.76 for Airplane/Chair/Multi Aligned; 1.11, 2.58, 2.22 for Unaligned).
Learns a learned canonical frame enabling semantically consistent decompositions and improved reconstruction details (e.g., wings, engines).
Demonstrates competitive/strong performance in canonicalization and pairwise registration, with stability measures (mStd) outperforming several baselines.
Features learned via Canonical Capsules yield stronger unsupervised classification performance (top-1 accuracy: aligned 94.21% SVM; unaligned 87.33% SVM).
Ablation shows essential roles for equivariance/invariance/canonical losses for maintaining reconstruction and canonicalization quality.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。