QUICK REVIEW

[論文レビュー] Dense Dynamic Scene Reconstruction and Camera Pose Estimation from Multi-View Videos

Shuo Sun, Unal Artan|arXiv (Cornell University)|Mar 12, 2026

Advanced Vision and Imaging被引用数 0

ひとこと要約

二段階最適化フレームワークを用いた密な動的シーン再構成と複数自由に動くカメラからのカメラ姿勢推定。広視差初期化と追跡後の深度リファインメントを含む時空的マルチカメラ追跡を使用。

ABSTRACT

We address the challenging problem of dense dynamic scene reconstruction and camera pose estimation from multiple freely moving cameras -- a setting that arises naturally when multiple observers capture a shared event. Prior approaches either handle only single-camera input or require rigidly mounted, pre-calibrated camera rigs, limiting their practical applicability. We propose a two-stage optimization framework that decouples the task into robust camera tracking and dense depth refinement. In the first stage, we extend single-camera visual SLAM to the multi-camera setting by constructing a spatiotemporal connection graph that exploits both intra-camera temporal continuity and inter-camera spatial overlap, enabling consistent scale and robust tracking. To ensure robustness under limited overlap, we introduce a wide-baseline initialization strategy using feed-forward reconstruction models. In the second stage, we refine depth and camera poses by optimizing dense inter- and intra-camera consistency using wide-baseline optical flow. Additionally, we introduce MultiCamRobolab, a new real-world dataset with ground-truth poses from a motion capture system. Finally, we demonstrate that our method significantly outperforms state-of-the-art feed-forward models on both synthetic and real-world benchmarks, while requiring less memory.

研究の動機と目的

Rigid extrinsics を用いない複数の自由に動くカメラからの堅牢な密な動的シーン再構成の実現。
重複および非重複視野間で一貫したスケールと正確なカメラ姿勢推定を達成。
ロバスト性と効率性を向上させるため、初期追跡と密な深度リファインメントを分離した二段階パイプラインの開発。
動的マルチビュー再構成手法の評価のための ground-truth ポーズを含む実世界のマルチカメラデータセットの提供。

提案手法

intra-camera の時系列と inter-camera の空間的重なりを結ぶ時空的接続グラフを介して、単一カメラSLAMをマルチカメラ設定へ拡張し、共同最適化を実現。
広視差初期化戦略を用いたフィードフォワード再構成モデルでグローバルスケールアンカーと初期姿勢を提供。
広視差光学フローを用いた密なカメラ間・カメラ内の一貫性最適化により深度とカメラ姿勢を refine。
密な対応付けとフレームごとのスケール/シフトパラメータを用いた二段階の深度リファインメントを導入し、モノクロ深度予測をカメラ間で整合。
最適化中の姿勢正則化と時系列平滑性を活用してオンラインリファインメントを安定化。

Figure 2 : Method Overview. Given multiple video inputs: Our method first uses a feed-forward model for initialization to achieve a global scale anchor and initialized poses (Step1). Then, we build a spatio-temporal connection graph during tracking to estimate camera poses and maintain a consistent

実験結果

リサーチクエスチョン

RQ1事前校正リグなしで、複数の自由に動くカメラ設定はロバストでメトリックに一貫した密なシーン再構成を実現できるか。
RQ2カメラ間の時空的接続は、動的シーンにおいて追跡のロバスト性とスケールの一貫性をどう改善するか。
RQ3初期追跡と密な深度リファインメントという二段階アプローチは、完全なフィードフォワードモデルより再構成品質を向上させつつメモリ要件を低減できるか。
RQ4広視差初期化が視野の重なりが限定される状況でのロバスト性向上にどのような影響を与えるか。
RQ5マルチビュー深度リファインメントと光学フローに基づく制約は、実世界のマルチカメラデータセットでどの程度機能するか。

主な発見

提案手法は、合成・実測バージョンのベンチマークにおいて最先端のフィードフォワードモデルと比較して追跡と再構成の成績が優れている。
競合するフィードフォワード法よりもメモリ消費が少ない一方、姿勢と深度の精度は向上。
時空的接続グラフは、カメラ内の時間的一貫性とカメラ間の空間的重なりを効果的に活用し、スケールの一貫性を維持。
VGGTを用いた広視差初期化とモノクロ深度の整合は、重なりが難しい状況で堅牢なグローバルスケールアンカーを提供。
密な光学フローとフレームごとのスケール/シフト最適化を組み合わせた二段階の深度リファインメントは、深度のフリックを抑え、マルチビューの一貫性を改善。
新しい実世界データセット MultiCamRobolab において、モーションキャプチャによる ground-truth ポーズとともに強い性能を示す。

Figure 3 : Demonstration spatio-temporal graph. First, each camera will estimate temporal connections with its own frames. Second, at the timestamp $t_{0}$ , Cam.1 will try to make a spatial connection with Cam.0 if there is enough overlap. Additionally, the current active keyframe will try to make

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。