QUICK REVIEW

[論文レビュー] Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast Contrastive Fusion

Yash Bhalgat, Iro Laina|arXiv (Cornell University)|Jun 7, 2023

Advanced Neural Network Applications被引用数 9

ひとこと要約

この論文は、スロー-ファストのコントラスト学習融合によって、2Dのインスタンスセグメンテーションを3Dへ拡張し、視点間での明示的な追跡なしにスケーラブルな3Dオブジェクトインスタンスセグメンテーションを実現する。

ABSTRACT

Instance segmentation in 3D is a challenging task due to the lack of large-scale annotated datasets. In this paper, we show that this task can be addressed effectively by leveraging instead 2D pre-trained models for instance segmentation. We propose a novel approach to lift 2D segments to 3D and fuse them by means of a neural field representation, which encourages multi-view consistency across frames. The core of our approach is a slow-fast clustering objective function, which is scalable and well-suited for scenes with a large number of objects. Unlike previous approaches, our method does not require an upper bound on the number of objects or object tracking across frames. To demonstrate the scalability of the slow-fast clustering, we create a new semi-realistic dataset called the Messy Rooms dataset, which features scenes with up to 500 objects per scene. Our approach outperforms the state-of-the-art on challenging scenes from the ScanNet, Hypersim, and Replica datasets, as well as on our newly created Messy Rooms dataset, demonstrating the effectiveness and scalability of our slow-fast clustering method.

研究の動機と目的

3D再構成とセグメンテーションを同時に、3Dマスクや明示的追跡なしで可能にするため、2Dインスタンスセグメンテーションを3Dへリフトする動機付け。
ラベルの順列不変性を保ちながら、3Dのオブジェクトインスタンスをピクセルをグループ化する低次元の埋め込み場を学ぶContrastive Liftを提案。
勾配分散を低減するため、モーメンタム教師を用いたスロー-ファストコントラスト学習スキームを開発。
Messy Roomsデータセットを導入し最大500オブジェクトでのスケーラビリティを示し、複数ベンチマークで最先端手法より性能を改善。

提案手法

3Dシーンを、密度・色・各3D点のD次元インスタンス埋め込みを予測するニューロン場で表現。
微分可能なレイキャスティングを介して画像空間の埋め込みをレンダリングし、RGB再構成のフォトメトリック損失で訓練。
同じ画像内の同一2Dセグメントに属するピクセルの埋め込みを引き寄せ、異なるセグメントを離すコントラスト埋め込み目的を、遅い-速い(モメンタム)変種で勾配分散を低減して訓練。
遅い埋め込み場 Θ (教師) を指数移動平均で更新し、速い埋め込み場 Θ (生徒) を用いて、速い・遅い場の埋め込みから分割されたピクセル集合でコントラスト損失を計算。
速い埋め込みを遅い場が予測するセントロイドへ収束させる集中損失を導入し、コンパクトで判別可能なクラスタを促進。
訓練後、新しい視点の埋め込みをレンダリングし、それらをクラスタリング（例: HDBSCAN）して一貫した3Dインスタンスラベルを得る。

Figure 1 : Contrastive Lift takes as input several views of a scene (left), as well as the output of a panoptic 2D segmenter (middle). It then reconstructs the scene in 3D while fusing the 2D segments, which are noisy and generally labelled inconsistently between views, when no object association (t

実験結果

リサーチクエスチョン

RQ1追跡なしで、2Dインスタンスセグメンテーションの出力を効果的に統合して、一貫した3Dインスタンスセグメンテーションを得られるか？
RQ2スロー-ファストコントラスト埋め込みフレームワークは、多数のオブジェクトが存在するシーンに適した、スケーラブルで順序不変な3Dインスタンス表現を提供するか？
RQ3提案手法は、標準ベンチマークや大規模オブジェクトデータセットで、最先端の3Dパンオプティック手法（例: Panoptic Lifting）とどう比較されるか？
RQ4埋め込み次元と追加の集中項がクラスタリング品質と3Dセグメンテーション性能に与える影響は？

主な発見

手法	ScanNet	HyperSim	Replica
DM-NeRF	41.7	51.6	44.1
PNF	48.3	44.8	41.1
PNF + GT BBoxes	54.3	47.6	52.5
PanopLi [49]	58.9	60.1	57.9
Vanilla (Ours)	60.5	60.9	57.8
Slow-Fast (Ours)	62.3	62.3	59.1

Slow-Fast変種は、ScanNet、Hypersim、Replica、Messy Roomsのベースラインを一貫して上回り、これらのデータセットでPanoptic Liftingより最大3.9 PQ(scene)ポイント高い。
遅い-速い学習を用いたContrastive Liftは、従来のコントラスト学習よりもコンパクトで識別性の高い埋め込みを生み、後処理のクラスタリングを支援。
Messy Roomsでは、Contrastive Liftは最大500オブジェクトのシーンへスケールし、オブジェクト数が増えるにつれてPanoptic Liftingに対する利益が大きくなる。
Mask2Former、MaskFormer、Deticを2Dセグメンターとして使用すると、Contrastive Liftは1フレームごとのPQを実質的に改善（例: 61.7/61.6/62.1 対 baseline）。
この手法は固定オブジェクト数Kを不要とし、Kが増加しても線形割り当てベース手法より訓練速度が優れている場合もある。

Figure 2 : Overview of the Contrastive Lift architecture. See Section 3 for details.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。