QUICK REVIEW

[論文レビュー] D$^2$NeRF: Self-Supervised Decoupling of Dynamic and Static Objects from a Monocular Video

Tianhao Wu, Fangcheng Zhong|arXiv (Cornell University)|May 31, 2022

Human Pose and Action Recognition被引用数 41

ひとこと要約

D2NeRF は自己教師ありの放射場と影場を用いて動的物体（およびその影）を静的背景から分離することで monocular video からデカップルされた3Dシーンを学習します。従来の手法より動的/静的の分離と新規ビュー合成を改善します。

ABSTRACT

Given a monocular video, segmenting and decoupling dynamic objects while recovering the static environment is a widely studied problem in machine intelligence. Existing solutions usually approach this problem in the image domain, limiting their performance and understanding of the environment. We introduce Decoupled Dynamic Neural Radiance Field (D$^2$NeRF), a self-supervised approach that takes a monocular video and learns a 3D scene representation which decouples moving objects, including their shadows, from the static background. Our method represents the moving objects and the static background by two separate neural radiance fields with only one allowing for temporal changes. A naive implementation of this approach leads to the dynamic component taking over the static one as the representation of the former is inherently more general and prone to overfitting. To this end, we propose a novel loss to promote correct separation of phenomena. We further propose a shadow field network to detect and decouple dynamically moving shadows. We introduce a new dataset containing various dynamic objects and shadows and demonstrate that our method can achieve better performance than state-of-the-art approaches in decoupling dynamic and static 3D objects, occlusion and shadow removal, and image segmentation for moving objects.

研究の動機と目的

モノラル動画から分離可能な3Dシーンを回復する動機：動く物体とその影を静的背景から分離すること。
新規の視点から静的成分と動的成分を分離してレンダリングできる自己教師付きニューラル表現の開発。
静的背景の放射輝度が汚染されないよう、動的な影を明示的に扱う。
時間変化する影の効果をモデル化し除去するための影場を導入。
最先端手法と比較して分離と背景再構成の改善を示すデータセットと評価を提供。

提案手法

静的シーンと動的シーンを別々のニューラル放射場で表現。静的は F^S、動的は F^D。動的成分には各フレームごとの時間潜在コードを用いる。
カメラ光線に沿って両方の場の寄与を積分して色を合成する体積レンダリングを用いる。
光線に沿った静的/動的密度のクリーンな分離を促す歪んだエントロピー損失を導入（動的成分の過学習に対処）。
静的放射を影響する影の減衰をモデル化する影場ネットワーク ρ を追加し、闇を過剰に説明しないよう影正則化項を含める。
モノラル動作シナリオで静的背景の回復を安定化させるために、光線ごとの密度正則化 (L_r) と光線密度分布事前 (L_{σ^S}) を取り入れる。
レンダリング時には、動的放射を加える前に静的放射を (1 − ρ) で乗算して陰影を考慮する。

実験結果

リサーチクエスチョン

RQ1Can a self-supervised 3D representation decouple dynamic and static scene components from a single monocular video?
RQ2How can shadows from moving objects be modeled so that static background reconstruction remains accurate?
RQ3What regularizers are needed to prevent the dynamic NeRF from absorbing static scene content?
RQ4Is it possible to achieve high-quality novel-view synthesis of the static background while removing dynamic occluders and their shadows?
RQ5Does the proposed approach generalize to real-world monocular videos with rapid motion and moving shadows?

主な発見

Outperforms state-of-the-art approaches on decoupling dynamic objects and shadows from monocular video in terms of novel-view synthesis of the static background.
Demonstrates improved 3D reconstruction of the static environment while removing dynamic occluders and their shadows.
The skewed entropy loss is critical for effective static/dynamic separation and mitigating overfitting of the dynamic component.
The shadow field enables removal of large-area shadows correlated with motion without requiring explicit light-model changes.
A new dataset with dynamic objects and moving shadows supports evaluation in both synthetic and real-world settings.
Qualitative results show clearer static backgrounds and accurate dynamic object segmentation in 2D images.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。