QUICK REVIEW

[論文レビュー] Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth

Doyeon Kim, Woonghyun Ga|arXiv (Cornell University)|Jan 19, 2022

Advanced Vision and Imaging被引用数 78

ひとこと要約

本論文は、階層型トランスフォーマーエンコーダと選択的特徴融合を備えた軽量デコーダ、および垂直方向の CutDepth データ拡張を用いたモノキュラ Depth 推定のグローバル-ローカルパスネットワークを提案し、NYU Depth V2 で最先端の結果と強いデータセット横断一般化を達成する。

ABSTRACT

Depth estimation from a single image is an important task that can be applied to various fields in computer vision, and has grown rapidly with the development of convolutional neural networks. In this paper, we propose a novel structure and training strategy for monocular depth estimation to further improve the prediction accuracy of the network. We deploy a hierarchical transformer encoder to capture and convey the global context, and design a lightweight yet powerful decoder to generate an estimated depth map while considering local connectivity. By constructing connected paths between multi-scale local features and the global decoding stream with our proposed selective feature fusion module, the network can integrate both representations and recover fine details. In addition, the proposed decoder shows better performance than the previously proposed decoders, with considerably less computational complexity. Furthermore, we improve the depth-specific augmentation method by utilizing an important observation in depth estimation to enhance the model. Our network achieves state-of-the-art performance over the challenging depth dataset NYU Depth V2. Extensive experiments have been conducted to validate and show the effectiveness of the proposed approach. Finally, our model shows better generalisation ability and robustness than other comparative models.

研究の動機と目的

モノキュラ Depth 推定におけるグローバルな文脈と局所的な details を捉えることで改善を動機づける。
階層型トランスフォーマーエンコーダと効率的なデコーダを組み合わせたグローバル-ローカルパスネットワークを開発する。
低計算量で局所とグローバル特徴を適応的に融合する選択的特徴融合モジュールを提案する。
垂直 CutDepth などの深度特異的データ拡張を活用し、訓練を強化して垂直構造の手掛かりを活かす。
NYU Depth V2 での最先端性能と頑健性、および SUN RGB-D への一般化を示す。

提案手法

グローバル文脈とマルチスケール特徴をモデル化する階層型トランスフォーマーエンコーダを使用。
ボトルネック特徴を最小限の畳み込み層とバイリニアアップサンプリングで復元する軽量デコーダを設計。
局所特徴とグローバル特徴を注意機構で適応的に融合する選択的特徴融合（SFF）モジュールを導入。
垂直 CutDepth を採用し、垂直構造情報を保つ横方向カットを行う深度認識拡張を採用。
深度予測を最適化するスケール不変の対数深度損失で訓練する。

実験結果

リサーチクエスチョン

RQ1グローバル-ローカルパスアーキテクチャは、長距離の文脈と局所的な詳細を効果的に組み合わせることでモノキュラ Depth 推定を改善できるか？
RQ2提案された選択的特徴融合モジュールは、標準デコーダと比較して計算コストを抑えつつ深度マップを改善できるか？
RQ3垂直 CutDepth 拡張は垂直構造の手掛かりを活用して深度推定を改善するか？
RQ4提案手法は他の室内データセット（例：SUN RGB-D）へどの程度一般化し、一般的な画像劣化に耐性を示すか？

主な発見

Method	Params (M)	delta1↑	delta2↑	delta3↑	AbsRel↓	RMSE↓	log10↓
Eigen et al. (2014)	141	0.769	0.950	0.988	0.158	0.641	-
Fu et al. (2018)	110	0.828	0.965	0.992	0.115	0.509	0.051
Yin et al. (2019)	114	0.875	0.976	0.994	0.108	0.416	0.048
DAV (Huynh et al. 2020)	25	0.882	0.980	0.996	0.108	0.412	-
BTS (Lee et al. 2019)	47	0.885	0.978	0.994	0.110	0.392	0.047
AdaBins (Bhat et al. 2021)	78	0.903	0.984	0.997	0.103	0.364	0.044
DPT* (Ranftl et al. 2021)	123	0.904	0.988	0.998	0.110	0.357	0.045
Ours	62	0.915	0.988	0.997	0.098	0.344	0.042

大規模な外部データセットでの事前学習なしで、NYU Depth V2 で最先端または競合的な結果を達成。
SFF を備えたコンパクトなデコーダは、デコンボリューションやUNet風デコーダと比較してパラメータが大幅に少ない（いくつかの構成で 0.66M）。
垂直 CutDepth はベースラインの CutDepth よりも性能を向上させ、最良は p=0.75。
NYU Depth V2 では、提案手法は delta1=0.915、delta2=0.988、delta3=0.997、AbsRel=0.098、RMSE=0.344、log10=0.042、パラメータ数 62M を使用。
モデルは微調整なしで SUN RGB-D への強い一般化と、劣化に対する頑健性を示す。
広範なアブレーションは、デコーダ設計と垂直 CutDepth の寄与が性能向上の鍵であることを示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。