QUICK REVIEW

[論文レビュー] RealFusion: 360° Reconstruction of Any Object from a Single Image

Luke Melas-Kyriazi, Christian Rupprecht|arXiv (Cornell University)|Feb 21, 2023

Advanced Vision and Imaging被引用数 19

ひとこと要約

RealFusion は、拡散モデルの事前知識に導かれた新規ビューのドリームアップを用いて、1 枚の画像から任意の物体の完全な 360° の 3D モデルをニューラル輝度場に適合させて再構成します。InstantNGP を用いることで効率的に実現します。

ABSTRACT

We consider the problem of reconstructing a full 360° photographic model of an object from a single image of it. We do so by fitting a neural radiance field to the image, but find this problem to be severely ill-posed. We thus take an off-the-self conditional image generator based on diffusion and engineer a prompt that encourages it to "dream up" novel views of the object. Using an approach inspired by DreamFields and DreamFusion, we fuse the given input view, the conditional prior, and other regularizers in a final, consistent reconstruction. We demonstrate state-of-the-art reconstruction results on benchmark images when compared to prior methods for monocular 3D reconstruction of objects. Qualitatively, our reconstructions provide a faithful match of the input view and a plausible extrapolation of its appearance and 3D shape, including to the side of the object not visible in the image.

研究の動機と目的

単一の視点から完全な 360° の写真物体を復元する問題の動機づけを行い、単一画像 3D 再構成の ill-posed 性を強調する。
pretrained 2D diffusion image generator を prior として活用し、妄想的な新規ビューを生み出す方法を提案する。
効率的な多様尺度の輝度場表現と正則化を開発し、忠実な外観と妥当な几何をレンダリングする。
単一画像のテキスト反転を導入して拡散 prior を特定入力物体に条件付けする。
カテゴリ特異的 supervision なしで野外画像やベンチマークデータセットにおいて最先端の再構成品質を示す。

提案手法

appearance と geometry をニューラル輝度場（RF）として表現し、入力ビューと一致するよう再構成損失で最適化する。
単一画像のテキスト反転から学習したプロンプト埋め込みを条件として、前もって訓練された拡散モデルを用い、物体の妥当な新規ビューを合成する。
Score Distillation Sampling（SDS）を適用して、RF を拡散モデルの priors とランダムにサンプリングした新規視点に対して整合させる。
低コストな InstantNGP グリッドベースの RF による coarse-to-fine 学習スケジュールを用いる。
2D 法線の滑らかさ、テクスチャドロップアウト、マスクベースの L2 条項などの正則化を組み込み、表面品質を向上させ、画像整合のマスク損失と法線正則化項を適用する。
入力ビューに忠実で先行知識の一貫性を保つために、各反復で新規ビューをサンプルする際に固定の再構成カメラを維持する。

Figure 2 : Method diagram. Our method optimizes a neural radiance field using two objectives simultaneously: a reconstruction objective and a prior objective. The reconstruction objective ensures that the radiance field resembles the input image from a specific, fixed view. The prior objective uses

実験結果

リサーチクエスチョン

RQ1拡散モデルの prior を入力画像で条件付けすると、単一ビューから 360° の物体再構成を忠実に実現できるか。
RQ2単一画像のテキスト反転は、再構成ビューの品質と多様性にどのような影響を与えるか。
RQ31 枚の画像から任意の物体を再構成する際、どの正則化や学習戦略が妥当な几何と外観を生み出すのか。
RQ4RealFusion は標準ベンチマークにおいて、カテゴリ特異的または multi-view 再構成手法と比較してどうか。

主な発見

カテゴリ	F-score (Shelf-Supervised)	CLIP-similarity (Shelf-Supervised)	F-score (RealFusion)	CLIP-similarity (RealFusion)
Backpack	7.58	0.72	12.22	0.74
Chair	8.26	0.65	10.23	0.76
Motorcycle	8.66	0.69	8.72	0.70
Orange	6.27	0.71	10.16	0.74
Skateboard	7.74	0.74	5.89	0.74
Teddybear	12.89	0.73	10.08	0.82
Vase	6.30	0.69	9.72	0.71

RealFusion は、単一画像からのベンチマーク再構成において、従来のモノキュラー 3D 手法と比較して定量的に最先端の結果を達成する。
定量評価（F-score）と外観類似性（CLIP）において、Shelf-Supervised Mesh Prediction を上回る平均的な改善を7つのオブジェクトカテゴリで示す。
単一画像のテキスト反転は高品質な再構成にとって不可欠であり、なければ背面は真の物体ではなく一般的なカテゴリの例に似てしまう。
粗ー細の学習と法線の滑らかさ正則化は表面品質を改善し、アーティファクトを減らす。
拡散 priors として Stable Diffusion は、CLIP などの代替よりも高品質な再構成を生み出す。
RealFusion は同じ入力ビューから複数の妥当な 360° 再構成を生成でき、主に遮蔽された背面のバリエーションが生じる。

Figure 3 : Examples demonstrating the level of detail of information captured by the optimized embedding $\langle\textbf{e}\rangle$ . Rows 1-2 show input images and masks. The images are used to optimize $\langle\textbf{e}\rangle$ via our single-image textual inversion process. Rows 3-5 show example

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。