QUICK REVIEW

[論文レビュー] InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

Jiale Xu, Weihao Cheng|arXiv (Cornell University)|Apr 10, 2024

Computer Graphics and Visualization Techniques被引用数 14

ひとこと要約

InstantMeshは、マルチビュー拡散モデルと疎視点のLarge Reconstruction Model (LRM)および微分可能なアイソサーフェース抽出を組み合わせることにより、単一画像からの3Dメッシュ生成を約10秒で実現する高速な前方伝播パイプラインです。

ABSTRACT

We present InstantMesh, a feed-forward framework for instant 3D mesh generation from a single image, featuring state-of-the-art generation quality and significant training scalability. By synergizing the strengths of an off-the-shelf multiview diffusion model and a sparse-view reconstruction model based on the LRM architecture, InstantMesh is able to create diverse 3D assets within 10 seconds. To enhance the training efficiency and exploit more geometric supervisions, e.g, depths and normals, we integrate a differentiable iso-surface extraction module into our framework and directly optimize on the mesh representation. Experimental results on public datasets demonstrate that InstantMesh significantly outperforms other latest image-to-3D baselines, both qualitatively and quantitatively. We release all the code, weights, and demo of InstantMesh, with the intention that it can make substantial contributions to the community of 3D generative AI and empower both researchers and content creators.

研究の動機と目的

高速でスケーラブルな単一画像からの3Dメッシュ生成を広範な応用（VR/AR、デザイン、ゲーム）に向けて動機づける。
Open-worldな3D priorsをLRMで活用して一般化を向上させる。
マルチビュー拡散とメッシュ中心の再構成および微分可能な監督（深度/法線）を統合してジオメトリとテクスチャを強化する。
メッシュベースの監督と段階的訓練を通じて大規模データセットでの訓練効率とスケーラビリティを Enable する。

提案手法

単一入力画像から六つの一貫した新規ビューを生成するためにマルチビュー拡散モデルを使用する。
生成されたビューから直接3Dメッシュを予測するためにTransformerベースの疎視点LRMを用いる。
ジオメトリをメッシュとして表現し、Efficientな表面抽出と監督のために微分可能なアイソサーフェース抽出モジュール（FlexiCubes）を統合する。
二段階訓練：(Stage 1) 画像/マスク損失を用いたtriplane NeRF表現で訓練；(Stage 2) 深度/法線監督および正則化を伴うメッシュ表現へ切り替え。
ホワイト背景・六ビュー出力を安定させるためにZero123++で拡散モデルをファインチューニング。
実用的な利用のために4つのモデル variantes（NeRF/base、NeRF/large、Mesh/base、Mesh/large）を公開ウェイトとして提供。

Figure 2 : The overview of our InstantMesh framework. Given an input image, we first utilize a multi-view diffusion model to synthesize 6 novel views at fixed camera poses. Then we feed the generated multi-view images into a transformer-based sparse-view large reconstruction model to reconstruct a h

実験結果

リサーチクエスチョン

RQ1拡散ベースのマルチビュー生成と疎視点LRMを組み合わせることで、単一入力画像から数秒で高品質な3Dメッシュへ変換できるか。
RQ2微分可能なアイソサーフェース抽出と直接的なメッシュ監督を統合することで、ジオメトリとテクスチャはtriplane/NeRFベースのアプローチより改善されるか。
RQ3メッシュベースの再構成は、マルチビュー監督下で2Dビュー品質と3D幾何精度の点でNeRFベース再構成と比較してどうか。
RQ4入力ビュー数や訓練戦略の違いは、オープンワールドオブジェクトへのスケーラビリティと一般化にどのような影響を与えるか。

主な発見

Method	PSNR	SSIM	LPIPS	CD	FS
TripoSR	23.373	0.868	0.213	0.217	0.843
LGM	21.538	0.871	0.216	0.345	0.671
CRM	22.195	0.891	0.150	0.252	0.787
SV3D	22.098	0.861	0.201	-	-
Ours (NeRF)	23.141	0.898	0.119	0.177	0.882
Ours (Mesh)	22.794	0.897	0.120	0.180	0.880
Table3_TripoSR	21.996	0.877	0.198	0.245	0.811
Table3_LGM	20.434	0.864	0.226	0.382	0.635
Table3_CRM	21.630	0.892	0.147	0.246	0.802
Table3_SV3D	21.510	0.866	0.186	-	-
Table3_OursNeRF	22.635	0.903	0.110	0.199	0.869
Table3_OursMesh	21.954	0.901	0.112	0.203	0.864
Table4_TripoSR	19.977	0.859	0.206	0.221	0.847
Table4_LGM	18.665	0.832	0.250	0.356	0.653
Table4_CRM	19.422	0.865	0.172	0.274	0.778
Table4_SV3D	20.294	0.853	0.176	-	-
Table4_OursNeRF	19.752	0.869	0.150	0.206	0.863
Table4_OursMesh	19.552	0.868	0.150	0.204	0.866

InstantMeshは公開データセット上で最先端のimage-to-3D性能を達成し、2D新規ビュー品質（SSIM, LPIPS）と3Dジオメトリ（Chamfer CD, F-Score）でベースラインを上回る。
FlexiCubesを用いたメッシュベースの変種は、triplane NeRFベースの手法より滑らかな表面とより良い幾何監督を提供する。
本フレームワークは単一画像から約10秒で多様で高品質な3D資産を生成可能。
4つのモデル variantes（NeRF/base、NeRF/large、Mesh/base、Mesh/large）をリリースし、異なるアプリニーズを支援。
LRMベースのアーキテクチャとメッシュ重視の監督戦略により、大規模〜非常に大規模データでの訓練を可能にする。

Figure 3 : The 3D meshes generated by InstantMesh demonstrate significantly better geometry and texture compared to the other baselines. The results of InstantMesh are rendered at a fixed elevation of $20^{\circ}$ , while the results of other methods are rendered at a fixed elevation of $0^{\circ}$

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。