QUICK REVIEW

[論文レビュー] Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer

René Ranftl, Katrin Lasinger|arXiv (Cornell University)|Jul 2, 2019

Advanced Vision and Imaging被引用数 44

ひとこと要約

著者らは多様な深度データセットを混在させるためのロス関数とトレーニング戦略を開発し、 monocular depth estimation のゼロショット跨データ転送を実現するとともに、複数データセットにわたる最先端の結果を達成します。

ABSTRACT

The success of monocular depth estimation relies on large and diverse training sets. Due to the challenges associated with acquiring dense ground-truth depth across different environments at scale, a number of datasets with distinct characteristics and biases have emerged. We develop tools that enable mixing multiple datasets during training, even if their annotations are incompatible. In particular, we propose a robust training objective that is invariant to changes in depth range and scale, advocate the use of principled multi-objective learning to combine data from different sources, and highlight the importance of pretraining encoders on auxiliary tasks. Armed with these tools, we experiment with five diverse training datasets, including a new, massive data source: 3D films. To demonstrate the generalization power of our approach we use zero-shot cross-dataset transfer}, i.e. we evaluate on datasets that were not seen during training. The experiments confirm that mixing data from complementary sources greatly improves monocular depth estimation. Our approach clearly outperforms competing methods across diverse datasets, setting a new state of the art for monocular depth estimation. Some results are shown in the supplementary video at https://youtu.be/D46FzVyL9I8

研究の動機と目的

多様な環境に跨るロバストな monocular depth estimation を、複数の偏ったデータセットを活用して動機づける。
データセット間のスケール差とベースライン差に不変なトレーニング目的を開発する。
多様なデータソースを組み合わせるための原理的な多目的データ混合戦略を提案する。
性能向上のために高容量エンコーダとエンコーダ事前学習の重要性を強調する。

提案手法

未知のスケールとシフトを跨ぐデータセット間の不確定性を扱うために disparity space での予測を行う。
スケール不変・シフト不変の損失 (Lssi) を最小二乗法とロバストな変種 (Lssimse, Lssimae, Lssitrim) で導入する。
損失計算中にスケールと平行移動 (s, t) を解く整列戦略を提供する。
勾配正則化項 (Lreg) を組み込み、深度の discontinuities を鋭くし ground-truth edges と整合させる。
トレーニングのための naive と Pareto-optimal の多データセット混合戦略を比較する（等分割 vs. 多目的最適化）。
跨データセット転送への影響を評価するためにエンコーダアーキテクチャと事前学習（ImageNet, WS-augmented）を評価する。

実験結果

リサーチクエスチョン

RQ1混合された複数の偏った depth データセットは unseen データセットへの generalization を改善し得るか（ゼロショット転送）？
RQ2トレーニング時にデータセット間のスケールとベースラインの不整合をどう扱うべきか？
RQ3多目的（Pareto）データ混合戦略は monocular depth estimation における naive な混合より優れているか？
RQ4エンコーダの容量と事前学習は跨データセット転送性能にどのような影響を与えるか？
RQ5スケールとシフト不変の loss を用いた disparity space での予測は多様なデータソースで数値的に安定かつ効果的か？

主な発見

補完的なデータセットを混合することで、ゼロショット跨データセット転送における monocular depth estimation が大幅に改善される。
disparity space におけるスケール不変・シフト不変の損失は従来の損失より優れており、組み合わせ変種（例: Lssitrim + Lreg）を含む。
ImageNet で事前学習された高容量エンコーダ（特に ResNeXt-101-WSL）は substantial な性能向上をもたらす。
強力な性能には大規模な補助タスクでのエンコーダ事前学習が不可欠である。
Pareto-optimal な多タスクデータ混合は naive な等分割混合より利点をもたらす。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。