QUICK REVIEW

[論文レビュー] LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes

Jaeyoung Chung, Suyong Lee|arXiv (Cornell University)|Nov 22, 2023

Generative Adversarial Networks and Image Synthesis被引用数 12

ひとこと要約

LucidDreamerはDreamingとAlignmentを通じてマルチビュー一貫の点群を構築した後、レンダリングのためにGaussian splatsを最適化して、テキスト・RGB・RGBD入力から多様で高品質な3Dシーンを生成します。Stable Diffusionと深度推定を活用して、ドメインフリーな3Dシーン創出を実現します。

ABSTRACT

With the widespread usage of VR devices and contents, demands for 3D scene generation techniques become more popular. Existing 3D scene generation models, however, limit the target scene to specific domain, primarily due to their training strategies using 3D scan dataset that is far from the real-world. To address such limitation, we propose LucidDreamer, a domain-free scene generation pipeline by fully leveraging the power of existing large-scale diffusion-based generative model. Our LucidDreamer has two alternate steps: Dreaming and Alignment. First, to generate multi-view consistent images from inputs, we set the point cloud as a geometrical guideline for each image generation. Specifically, we project a portion of point cloud to the desired view and provide the projection as a guidance for inpainting using the generative model. The inpainted images are lifted to 3D space with estimated depth maps, composing a new points. Second, to aggregate the new points into the 3D scene, we propose an aligning algorithm which harmoniously integrates the portions of newly generated 3D scenes. The finally obtained 3D scene serves as initial points for optimizing Gaussian splats. LucidDreamer produces Gaussian splats that are highly-detailed compared to the previous 3D scene generation methods, with no constraint on domain of the target scene. Project page: https://luciddreamer-cvlab.github.io/

研究の動機と目的

3Dスキャンやドメイン制約データを超えたドメインフリーパ 3Dシーン生成を動機づける。
大規模拡散モデルを活用して、3D構築を導くマルチビュー一貫の画像を生成する。
統一された3D点群を構築するための2段階パイプライン（DreamingとAlignment）を開発する。
構築された点群からGaussian splatting表現を最適化して高品質なレンダリングを実現する。

提案手法

Dreaming: カメラを軌道に沿って移動させ、可視点群領域を投影し、マスクされた領域をStable Diffusionでインペイントし、深度を推定して3Dへ引き上げ、点群を拡張する。
Alignment: カメラ光線に沿って点を移動させ、深度を補間して新しい3D点を既存の点群と整合させ、滑らかで結合されたシーンを作成する。
Complete point cloud P_NをDreamingとAlignmentを繰り返してマルチビュー一貫性を可能にする。
構築された点群と投影画像を用いて3D Gaussian splattingモデルを訓練し、カバレッジを改善する追加の合成視点を用いてシーンをレンダリング・改良する。
新旧の3D点間の差異を最小化するよう、スケール係数d_iを介して深度を調整し、一貫した3D幾何を保証する。

実験結果

リサーチクエスチョン

RQ1ドメインフリーなパイプラインは、多様な入力タイプ（テキスト、RGB、RGBD）から信頼性の高いマルチビュー一貫性と高品質な3Dシーンを生成できるのだろうか？
RQ2拡散モデルに基づくインペイント、深度推定、整合をどのように統合して、整合性のある高忠実度の3D Gaussian splattingシーンを作り出すことができるか？
RQ3Gaussian splatting最適化は、点群のみの表現と比較してレンダリング品質を改善し、深度ギャップを埋めるのか？

主な発見

モデル	CLIP-スコア	CLIP-IQA	品質	カラー
RGBD2 [ 22 ]	0.2035	0.1279	0.2081	0.0126
LucidDreamer	0.2110	0.6161	0.8453	0.5356

LucidDreamerは、現実的、アニメ風、レゴなど多様なドメインで、従来のドメイン制限手法より高品質な3Dシーンを生み出す。
DreamingとAlignmentの過程は、マルチビューの一貫性と、新たに生成されたシーンの部分を単一の3Dモデルにシームレスに統合することを可能にする。
Gaussian splattingの最終化は、穴を埋め、初期点群を超えたレンダリングリアリズムを向上させる。
複数のデータセットでRGBD2と比較したCLIPベースの指標を用いた定量的比較は、知覚品質と一貫性が優れていることを示唆する。
本手法は複数の入力タイプ（テキスト、RGB、RGBD）をサポートし、生成時の混合入力条件付けを許容する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。