QUICK REVIEW

[論文レビュー] IBRNet: Learning Multi-View Image-Based Rendering

Qianqian Wang, Zhicheng Wang|arXiv (Cornell University)|Feb 25, 2021

Advanced Vision and Imaging参考文献 60被引用数 27

ひとこと要約

IBRNetは、近傍の複数のソース視点を用いて高解像度の新規視点をレンダリングする、シーンごとの最適化を必要としない一般的なビュー補間関数を学習します。さらに、単一シーンのニューラルレンダリング手法に合わせて各シーンごとに微調整することができます。

ABSTRACT

We present a method that synthesizes novel views of complex scenes by interpolating a sparse set of nearby views. The core of our method is a network architecture that includes a multilayer perceptron and a ray transformer that estimates radiance and volume density at continuous 5D locations (3D spatial locations and 2D viewing directions), drawing appearance information on the fly from multiple source views. By drawing on source views at render time, our method hearkens back to classic work on image-based rendering (IBR), and allows us to render high-resolution imagery. Unlike neural scene representation work that optimizes per-scene functions for rendering, we learn a generic view interpolation function that generalizes to novel scenes. We render images using classic volume rendering, which is fully differentiable and allows us to train using only multi-view posed images as supervision. Experiments show that our method outperforms recent novel view synthesis methods that also seek to generalize to novel scenes. Further, if fine-tuned on each scene, our method is competitive with state-of-the-art single-scene neural rendering methods. Project page: https://ibrnet.github.io/

研究の動機と目的

近接した近傍ビューの稀なセットからの新規ビュー合成を、シーン特異的な最適化なしに促進する。
複数のビューから連続的な5D位置での色と密度を予測する、軽量で一般化可能なネットワーク（IBRNet）を開発する。
レイに沿った長距離の文脈をレイ・トランスフォーマーを用いて取り入れ、密度推定とレンダリング精度を向上させる。
クラシックなボリュームレンダリングを用いた監視付きの多視点画像によるエンドツーエンド訓練を可能にする。
事前訓練されたIBRNetが未知のシーンへ一般化し、単一シーンニューラルレンダリング性能に近づくようシーンごとに微調整できることを示す。

提案手法

近傍のソース視点の小さな作業セットを選択し、各画像から dense features を抽出するモジュラーなパイプラインを用いる。
レイ上の各3D点について、マルチビュー特徴を集約し、aPointNet様のプーリングで密度特徴を算出し、レイ・トランスフォーマーで密度を予測する。
視点方向を考慮したブレンド重みを通じてソース視点の色を混合し、各サンプルの色を得て、ボリュームレンダリングで描画する。
IBRNetは連続的な5D位置（3D位置、2D視点方向）で動作し、微分可能であり、多視点の監視を用いたエンドツーエンド訓練を可能にする。
NeRFに類似した粗・細の階層的サンプリングを用いて、高品質な新規ビューを描画する。

Figure 1: System Overview . 1) To render a novel target view (shown here as the image labeled with a ‘?’), we first identify a set of neighboring source views (e.g., the views labeled A and B ) and extract their image features. 2) Then, for each ray in the target view, we compute colors and densitie

実験結果

リサーチクエスチョン

RQ1一般的なシーンに依存しないビュー補間関数は、近接するソースビューから高品質な新規ビューを合成できるか？
RQ2レイ沿いに文脈を伝達するレイ・トランスフォーマーを組み込むと、密度推定とレンダリング品質は向上するか？
RQ3事前訓練済みモデルをシーンごとに微調整することは、Neuron NeRFのような単一シーンのニューラルレンダリング手法と比較して性能にどのような影響を与えるか？

主な発見

Method	Diffuse Synthetic 360° PSNR	Diffuse Synthetic 360° SSIM	Diffuse Synthetic 360° LPIPS	Realistic Synthetic 360° PSNR	Realistic Synthetic 360° SSIM	Realistic Synthetic 360° LPIPS	Real Forward-Facing PSNR	Real Forward-Facing SSIM	Real Forward-Facing LPIPS
LLFF No per-scene optimization	34.38	0.985	0.048	24.88	0.911	0.114	24.13	0.798	0.212
Ours (no ft)	37.17	0.990	0.017	25.49	0.916	0.100	25.13	0.817	0.205
SRN Per-scene optimization	33.20	0.963	0.073	22.26	0.846	0.170	22.84	0.668	0.378
NV	29.62	0.929	0.099	26.05	0.893	0.160	-	-	-
NeRF	40.15	0.991	0.023	31.01	0.947	0.081	26.50	0.811	0.250
Ours_ft	42.93	0.997	0.009	28.14	0.942	0.072	26.73	0.851	0.175

事前訓練済みのIBRNetは未知のシーンへ一般化し、全評価データセットでLLFFを上回る。
各シーンごとに微調整することで、Real Forward-Facingデータを中心に、いくつかのデータセットでNeRFと競合する性能を達成。
アブレーションはレイ・トランスフォーマーの必須性と、視点方向入力が品質向上に寄与するが、それだけではなく他の要因も影響することを示す。
IBRNetはワンショットの一般化設定で、PSNR/SSIMが高く、LPIPSが低い。
推論の効率はソース視点の数に依存し、ピクセルあたりのFLOPsはNeRFよりはるかに少なく、局所的な視点ベースの補間のためである。

Figure 2: IBRNet for volume density and color prediction at a continuous 5D location $(\mathbf{x},\mathbf{d})$ . We first input the 2D image features $\{\mathbf{f}_{i}\}_{i=1}^{N}$ extracted from all source views to a PointNet-like MLP to aggregate local and global information, resulting in multi-vi

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。