Skip to main content
QUICK REVIEW

[論文レビュー] Canonical Capsules: Self-Supervised Capsules in Canonical Pose

Weiwei Sun, Andrea Tagliasacchi|arXiv (Cornell University)|Dec 8, 2020
3D Shape Modeling and Analysis被引用数 36
ひとこと要約

ラベルなしでカノニカルフレームを学習し、意味的に一貫した部位分解を行う自己教師付きの 3D 点群カプセルアーキテクチャを提案し、再構成、正準化、および無監督分類を改善可能にする。

ABSTRACT

We propose a self-supervised capsule architecture for 3D point clouds. We compute capsule decompositions of objects through permutation-equivariant attention, and self-supervise the process by training with pairs of randomly rotated objects. Our key idea is to aggregate the attention masks into semantic keypoints, and use these to supervise a decomposition that satisfies the capsule invariance/equivariance properties. This not only enables the training of a semantically consistent decomposition, but also allows us to learn a canonicalization operation that enables object-centric reasoning. To train our neural network we require neither classification labels nor manually-aligned training datasets. Yet, by learning an object-centric representation in a self-supervised manner, our method outperforms the state-of-the-art on 3D point cloud reconstruction, canonicalization, and unsupervised classification.

研究の動機と目的

  • Motivate unsupervised learning for 3D point clouds without pre-aligned datasets.
  • Develop a permutation-equivariant capsule decomposition via attention.
  • Learn a canonical frame to enable object-centric reasoning.
  • Train end-to-end with Siamese pairs of randomly rotated objects.
  • Demonstrate state-of-the-art auto-encoding, canonicalization, and classification in 3D.

提案手法

  • Compute a K-part decomposition of a point cloud via a permutation-equivariant capsule encoder E.
  • Aggregate attention masks to obtain K capsule poses theta_k and descriptors beta_k (equations 2).
  • Regress canonical capsule poses from descriptors using a network K to obtain bar{theta}; enforce locality.
  • Canonicalize by solving a rigid alignment to align learned canonical keypoints bar{theta} with predicted poses (equation 5).
  • Decode per-capsule point clouds in the canonical frame and reconstruct the input (equation 4) using Chamfer distance as reconstruction loss (equation 11).
  • Train with Siamese pairs of randomly rotated/translated shapes to enforce equivariance/invariance (losses L_equivariance, L_invariance, L_equilibrium, L_localization, L_canonical).

実験結果

リサーチクエスチョン

  • RQ1Can a self-supervised capsule decomposition yield semantically consistent parts in unaligned 3D point clouds?
  • RQ2Does learning a canonical frame improve object-centric representations for reconstruction and downstream tasks?
  • RQ3How does the proposed canonicalization interact with transformation invariances/equivariance to enable unsupervised classification?
  • RQ4What is the impact of the various loss terms on reconstruction quality and canonicalization stability?

主な発見

MethodAirplane (Aligned)Chair (Aligned)Multi (Aligned)Airplane (Unaligned)Chair (Unaligned)Multi (Unaligned)
3D-PointCapsNet [64]1.943.302.495.587.574.66
AtlasNetV2 [12]1.282.362.142.803.983.08
Our method0.961.991.761.112.582.22
  • Achieves state-of-the-art Chamfer-distance-based auto-encoding on aligned and unaligned ShapeNet data (e.g., Our method: 0.96, 1.99, 1.76 for Airplane/Chair/Multi Aligned; 1.11, 2.58, 2.22 for Unaligned).
  • Learns a learned canonical frame enabling semantically consistent decompositions and improved reconstruction details (e.g., wings, engines).
  • Demonstrates competitive/strong performance in canonicalization and pairwise registration, with stability measures (mStd) outperforming several baselines.
  • Features learned via Canonical Capsules yield stronger unsupervised classification performance (top-1 accuracy: aligned 94.21% SVM; unaligned 87.33% SVM).
  • Ablation shows essential roles for equivariance/invariance/canonical losses for maintaining reconstruction and canonicalization quality.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。