QUICK REVIEW

[論文レビュー] Diffusion with Forward Models: Solving Stochastic Inverse Problems Without Direct Supervision

Ayush Tewari, Tianwei Yin|arXiv (Cornell University)|Jun 20, 2023

Generative Adversarial Networks and Image Synthesis被引用数 23

ひとこと要約

本論文は、直接的な監督なしに確率的な逆問題を解く逆グラフィックス拡散フレームワークを提案し、前方モデルと部分観測を活用して3D一貫性のある再構成とインペインティングを実現する。

ABSTRACT

Denoising diffusion models are a powerful type of generative models used to capture complex distributions of real-world signals. However, their applicability is limited to scenarios where training samples are readily available, which is not always the case in real-world applications. For example, in inverse graphics, the goal is to generate samples from a distribution of 3D scenes that align with a given image, but ground-truth 3D scenes are unavailable and only 2D images are accessible. To address this limitation, we propose a novel class of denoising diffusion probabilistic models that learn to sample from distributions of signals that are never directly observed. Instead, these signals are measured indirectly through a known differentiable forward model, which produces partial observations of the unknown signal. Our approach involves integrating the forward model directly into the denoising process. This integration effectively connects the generative modeling of observations with the generative modeling of the underlying signals, allowing for end-to-end training of a conditional generative model over signals. During inference, our approach enables sampling from the distribution of underlying signals that are consistent with a given partial observation. We demonstrate the effectiveness of our method on three challenging computer vision tasks. For instance, in the context of inverse graphics, our model enables direct sampling from the distribution of 3D scenes that align with a single 2D input image.

研究の動機と目的

3Dシーンにおける直接的な監督なしでの確率的逆問題の解法を動機づける。
前方モデルを取り込む拡散ベースのinverse-graphicsパイプラインを提案する。
訓練時・推論時のノイズのあるカメラ姿勢と部分観測に対する頑健性を示す。
Co3D や Objaverse などのデータセットで3D一貫性のある再構成とインペインティングを実証する。

提案手法

前方グラフィカルモデルで条件付けられた拡散プロセスを用いて2D観察から3D構造を推定する。
直接的な監督なしで逆グラフィックスを可能にするために前方モデルの事前情報を組み込む。
ノイズのあるカメラ姿勢を用いて訓練し、3D再構成の頑健性を向上させる。
欠損画像パッチを補完するようモデルを設計して部分観測のインペインティングを可能にする。
inverse-graphicsワークフローの更新されたパイプライン図を作成する。
Co3DおよびObjaverseデータセットでベースラインと比較する。

実験結果

リサーチクエスチョン

RQ1拡散ベースの inverse-graphics は直接的な監督なしで確率的逆問題を解決できるか？
RQ2姿勢ノイズの下で前方モデルは3Dの一貫性と再構成品質をどう改善するか？
RQ3部分観測から信頼できるインペインティングをモデルは実行できるか？
RQ4ノイズのある姿勢で訓練することがレンダリング品質と3Dの一貫性に与える影響は何か？
RQ5標準的な3Dデータセットで提案手法は既存のベースラインとどう比較されるか？

主な発見

手法	PSNR	LPIPS	FID
PixelNeRF	17.96	0.479	158.50
SparseFusion	11.76	0.770	257.63
Ours	17.62	0.368	66.81
With noise (ablation)	17.24	0.40	92.23
Ours (ablation)	18.19	0.34	56.64
Deterministic (2D inpainting)	21.35	0.11	9.18
Ours (2D inpainting)	20.18	0.09	4.25

提案手法はCo3D（10カテゴリ）でPSNR 17.62、LPIPS 0.368、FID 66.81でPixelNeRFおよびSparseFusionと比較して競争力のあるPSNR/LPIPS/FIDを達成する。
姿勢ノイズを除去した場合の消去実験はロバスト性を保つことを示し、ノイズありの訓練ではPSNR 17.24、LPIPS 0.40、FID 92.23、ノイズなしの17.62/0.368/66.81と比較。
2DインペインティングではPSNR 20.18、LPIPS 0.09、FID 4.25を達成し、表によれば決定論的ベースラインのPSNR 21.35、LPIPS 0.11、FID 9.18を上回る。
定性的結果は、抽出された点群による3D一貫性のある再構成とベースラインより改善されたインペインティングを示す。
更新されたパイプラインは前方モデルを用いた逆グラフィックスを強調し、部分観測からの学習を可能にする（論文で参照される図）。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。