QUICK REVIEW

[論文レビュー] D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry

Nan Yang, Lukas von Stumberg|arXiv (Cornell University)|Mar 2, 2020

Advanced Vision and Imaging参考文献 85被引用数 33

ひとこと要約

D3VOは深層深度、姿勢、および不確実性予測をモノラル直接 VO パイプラインに統合し、外部深度監視なしでKITTIとEuRoCにおいて最先端の結果を達成します。

ABSTRACT

We propose D3VO as a novel framework for monocular visual odometry that exploits deep networks on three levels -- deep depth, pose and uncertainty estimation. We first propose a novel self-supervised monocular depth estimation network trained on stereo videos without any external supervision. In particular, it aligns the training image pairs into similar lighting condition with predictive brightness transformation parameters. Besides, we model the photometric uncertainties of pixels on the input images, which improves the depth estimation accuracy and provides a learned weighting function for the photometric residuals in direct (feature-less) visual odometry. Evaluation results show that the proposed network outperforms state-of-the-art self-supervised depth estimation networks. D3VO tightly incorporates the predicted depth, pose and uncertainty into a direct visual odometry method to boost both the front-end tracking as well as the back-end non-linear optimization. We evaluate D3VO in terms of monocular visual odometry on both the KITTI odometry benchmark and the EuRoC MAV dataset.The results show that D3VO outperforms state-of-the-art traditional monocular VO methods by a large margin. It also achieves comparable results to state-of-the-art stereo/LiDAR odometry on KITTI and to the state-of-the-art visual-inertial odometry on EuRoC MAV, while using only a single camera.

研究の動機と目的

深度、姿勢、および不確実性の予測を活用して、堅牢なモノラルVOの動機づけと実現を行う。
明るさ整合と視覚フォトメトリック不確実性を伴うステレオ動画で学習する自己教師付きのDepthNetとPoseNetを開発する。
深層深度、姿勢、および不確実性を直接VOフレームワークに統合して、フロントエンドの追跡とバックエンドの最適化を改善する。
KITTI OdometryおよびEuRoC MAVでD3VOを評価し、ステレオ/ LiDAR および VIO 手法と競争力のある性能を示す。

提案手法

DepthNetとPoseNetを自己教師付きで提案し、深度D、相対姿勢T、およびフォトメトリック不確実性Sigmaを予測する。
トレーニングフレーム間の照明を整列させるためにアフィン明るさ変換パラメータを導入する。
画素ごとのフォトメトリック不確実性をアリアトリック不確実性としてモデル化し、トレーニングとVO最適化の残差にウェイトを付ける。
仮想ステレオ項と姿勢エネルギー項をスパースフォトメトリックバンドル調整フレームワークに組み込む。
エネルギー関数の指標スケール初期化、姿勢事前情報、およびウェイト付けとしてネットワーク予測を用い、追跡と最適化を導く。

実験結果

リサーチクエスチョン

RQ1ステレオ監視からメトリックスケールを予測できる自己教師付きモノラルネットワークが、直接パイプラインへ統合された場合にVOを改善できるか？
RQ2予測深度、姿勢、およびフォトメトリック不確実性を組み込むことが、モノラルVOのフロントエンド追跡とバックエンド最適化を改善するか？
RQ3KITTIとEuRoCで、深度/不確実性/ pPose の統合は、最先端のモノラル、ステレオ、およびVIO手法とどう比較されるか？

主な発見

01	02	06	08	09	10	平均
M DSO	9.17	114	42.2	177	28.1	24.0	65.8
D3VO	1.07	0.80	0.67	1.00	0.78	0.62	0.82
S LSD	2.13	1.09	1.28	1.24	1.22	0.75	1.29
ORB2	1.38	0.81	0.82	1.07	0.82	0.58	0.91
S DSO	1.43	0.78	0.67	0.98	0.98	0.49	0.89
Dd	1.16	0.84	0.71	1.01	0.82	0.73	0.88
Dd+Dp	1.15	0.84	0.70	1.03	0.80	0.72	0.87
Dd+Du	1.10	0.81	0.69	1.03	0.78	0.62	0.84
D3VO (best mono)	1.07	0.80	0.67	1.00	0.78	0.62	0.82

明るさ整合とフォトメトリック不確実性を備えた自己教師付きネットワークは、KITTI Eigen深度評価でMonodepth2を上回る。
D3VOはKITTI Odometryのテストシーケンスで最先端のモノラル VO 結果を達成し、多くのモノラーベースラインを上回り、ステレオ/ LiDAR性能に近づく。
EuRoC MAVではD3VOは競争力のあるモノラル VO 結果と頑健性を示し、エンドツーエンドおよびハイブリッド手法に近い性能を実現。
深層深度、深層姿勢、および深層不確実性の組み込みは、欠落した変種（Dd、Dp、Du）よりも大きな改善をもたらす。
この手法はEuRoCにおいて最先端VIO手法と同等の性能を、単一カメラのみで達成する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。