QUICK REVIEW

[論文レビュー] EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior

Zhipeng Hu, Minda Zhao|arXiv (Cornell University)|Aug 25, 2023

Computer Graphics and Visualization Techniques被引用数 10

ひとこと要約

EfficientDreamer は、1つのプロンプトから4つの直交ビューのサブ画像を生成する直交ビュー拡散モデルを導入し、進化的な 3D 合成と2段階メッシュ精製を通じて、堅牢で高忠実度のテキスト→3D 作成を実現します。

ABSTRACT

While image diffusion models have made significant progress in text-driven 3D content creation, they often fail to accurately capture the intended meaning of text prompts, especially for view information. This limitation leads to the Janus problem, where multi-faced 3D models are generated under the guidance of such diffusion models. In this paper, we propose a robust high-quality 3D content generation pipeline by exploiting orthogonal-view image guidance. First, we introduce a novel 2D diffusion model that generates an image consisting of four orthogonal-view sub-images based on the given text prompt. Then, the 3D content is created using this diffusion model. Notably, the generated orthogonal-view image provides strong geometric structure priors and thus improves 3D consistency. As a result, it effectively resolves the Janus problem and significantly enhances the quality of 3D content creation. Additionally, we present a 3D synthesis fusion network that can further improve the details of the generated 3D contents. Both quantitative and qualitative evaluations demonstrate that our method surpasses previous text-to-3D techniques. Project page: https://efficientdreamer.github.io.

研究の動機と目的

ビュー間で 3D 一貫性を強制することにより、テキスト→3D 生成における Janus 問題に対処する。
複数の視点から包括的な意味情報を伝える直交ビュー拡散事前知識を開発する。
直交ビューのガイダンスと従来の 2D 事前知識のバランスをとるための進化的な 3D 合成戦略を提案する。
粗さから細部への 2 段階最適化によって高忠実度の 3D メッシュとテクスチャを達成する。

提案手法

同一オブジェクトの4つの直交ビューを含む2x2の複合画像を出力する新規の直交ビュー拡散モデルを訓練する。
Objaverseベースのデータで拡散モデルを微調整して、ビュー間で一貫した多視点表現を学習する。
直交ビュー事前知識と事前学習済みの2D拡散事前知識の両方からScore Distillation Sampling (SDS)損失を適用し、3Dパラメータへバックプロパゲーションする。
トレーニング中に2つの事前知識をブレンドするための進行的重み付けスキームを使用し、反復ごとに直交ビューの影響を徐々に低減する。
初期ジオメトリに NeuS、ジオメトリ/テクスチャの仕上げに DMTet を用いた粗さから細部への 3D 合成。」],
research_questions([
Does an orthogonal-view diffusion prior improve 3D consistency and reduce the Janus problem compared to standard 2D priors?
How does progressive integration of orthogonal-view and pre-trained diffusion priors affect end-to-end 3D quality and stability?
Can the proposed framework produce high-fidelity 3D meshes with photorealistic textures from text prompts?
What is the impact of the two-stage optimization (NeuS then DMTet) on final geometry and texture quality?

実験結果

リサーチクエスチョン

RQ1Does an orthogonal-view diffusion prior improve 3D consistency and reduce the Janus problem compared to standard 2D priors?
RQ2How does progressive integration of orthogonal-view and pre-trained diffusion priors affect end-to-end 3D quality and stability?
RQ3Can the proposed framework produce high-fidelity 3D meshes with photorealistic textures from text prompts?
RQ4What is the impact of the two-stage optimization (NeuS then DMTet) on final geometry and texture quality?

主な発見

Method	CLIP Score ↑	FID ↓
DreamFusion [22]	28.40	374.44
Magic3D [15]	29.15	310.57
TextMesh [31]	27.65	305.77
Ours	30.33	284.98

The method achieves higher CLIP scores and lower FID than DreamFusion, Magic3D, and TextMesh on 22 prompts (CLIP 30.33, FID 284.98 for Ours vs. 28.40/374.44, 29.15/310.57, 27.65/305.77 for others).
User study shows our approach receives an average rating of 3.74 vs. 2.09–2.44 for baselines and 84.27% prefer our results.
A progressive 3D synthesis strategy yields complete and smooth meshes and mitigates holes compared to using only one prior.
The orthogonal-view prior substantially reduces Janus-type multi-face artifacts relative to solely pre-trained diffusion priors.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。