QUICK REVIEW

[論文レビュー] BA-Net: Dense Bundle Adjustment Network

Chengzhou Tang, Ping Tan|arXiv (Cornell University)|Jun 13, 2018

Advanced Vision and Imaging参考文献 56被引用数 129

ひとこと要約

BA-Net は、微分可能な特徴量-モetric バンドル調整層と、学習済み basis depth マップによる密な深度表現を導入し、複数視点での structure-from-motion のエンドツーエンド学習を実現します。

ABSTRACT

This paper introduces a network architecture to solve the structure-from-motion (SfM) problem via feature-metric bundle adjustment (BA), which explicitly enforces multi-view geometry constraints in the form of feature-metric error. The whole pipeline is differentiable so that the network can learn suitable features that make the BA problem more tractable. Furthermore, this work introduces a novel depth parameterization to recover dense per-pixel depth. The network first generates several basis depth maps according to the input image and optimizes the final depth as a linear combination of these basis depth maps via feature-metric BA. The basis depth maps generator is also learned via end-to-end training. The whole system nicely combines domain knowledge (i.e. hard-coded multi-view geometry constraints) and deep learning (i.e. feature learning and basis depth maps learning) to address the challenging dense SfM problem. Experiments on large scale real data prove the success of the proposed method.

研究の動機と目的

マルチビュー幾何学の制約を、学習可能な SfM パイプラインへ微分可能な BA 層で組み込む。
バンドル調整の最適化ロバスト性を高めるための、学習可能な特徴表現を獲得する。
密な深度マップのための、エンコーダ-デコーダネットワークによって生成される 128 個の basis depth マップの線形結合として、コンパクトで学習可能な深度表現を開発する。

提案手法

複数ビュー間の特徴量-モetric 誤差を最小化する微分可能な BA 層を導入する。
CNN ベースの特徴ピラミッド（学習済み特徴）を構築し、BA 最適化のための安定した多スケール入力を提供する。
深度を Dense に、エンコーダ-デコーダネットワークによって生成される 128 個の basis depth マップの線形結合としてパラメータ化する。
差分可能な Levenberg–Marquardt 最適化を可能にするため、ダンピング因子 lambda を MLP で予測する。
特徴ピラミッドとワーピングをまたぐ微分可能な LM ステップで粗から細へ 5 イテレーション（階層あたり 3 階層で合計 15）を実行する。
姿勢と深度に対する教師付き損失で、バックボーン、特徴ピラミッド、ダンピング予測器、basis-depth ジェネレータをエンドツーエンドで訓練する。

実験結果

リサーチクエスチョン

RQ1微分可能な特徴量-モ metric BA 層は、SfM のエンドツーエンド学習を可能にしつつ、多視点幾何学の制約を課すことができるか。
RQ2 basis-depth パラメータ化を学習することは、多視点シーンでの密な深度再現と最適化収束を改善するか。
RQ3 BA に特化した特徴学習は、実データセット上の photometric/幾何 BA および従来の SfM ネットワークと比較してどうなるか。

主な発見

方法	回転（度）	並進（cm）	並進（度）	絶対相対差	二乗相対差	RMSE（線形）	RMSE（対数）	RMSE（対数、スケール反転）
Ours	1.018	3.39	20.577	0.161	0.092	0.346	0.214	0.184
Ours*	1.587	10.81	31.005	0.238	0.176	0.488	0.279	0.276
DeMoN*	3.791	15.5	31.626	0.231	0.520	0.761	0.289	0.284
Photometric BA	4.409	21.40	34.36	0.268	0.427	0.788	0.330	0.323
Geometric BA	8.56	36.995	39.392	0.382	0. -	0.876	0.366	0.357

BA-Net は ScanNet および KITTI データセットで DeMoN、LS-Net、従来の BA のベースラインを上回る。
学習済み特徴を用いた特徴量-モ metric BA は、RGB や事前学習済み CNN 特徴よりも滑らかな目的関数の形状とより良い収束をもたらす。
密な深度は、basis マップの学習済み線形結合として効果的に生成され、物体境界との一貫性が改善される。
微分可能な LM と学習済みダンピング因子により、BA プロセスを通じたエンドツーエンド訓練と逆伝播が可能になる。
KITTI では、BA-Net が監視付きおよび非監視付きベースラインと比較して、カメラ軌跡と深度指標で優位を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。