QUICK REVIEW

[論文レビュー] Cascade Residual Learning: A Two-stage Convolutional Neural Network for Stereo Matching

Jiahao Pang, Wenxiu Sun|arXiv (Cornell University)|Aug 30, 2017

Advanced Vision and Imaging参考文献 21被引用数 54

ひとこと要約

2段階カスケードCNN（DispFulNet + DispResNet）は、マルチスケール残差学習で初期の高解像度視差を refine し、効率を保ちながら KITTI 2015 ステレオの最先端結果を達成。

ABSTRACT

Leveraging on the recent developments in convolutional neural networks (CNNs), matching dense correspondence from a stereo pair has been cast as a learning problem, with performance exceeding traditional approaches. However, it remains challenging to generate high-quality disparities for the inherently ill-posed regions. To tackle this problem, we propose a novel cascade CNN architecture composing of two stages. The first stage advances the recently proposed DispNet by equipping it with extra up-convolution modules, leading to disparity images with more details. The second stage explicitly rectifies the disparity initialized by the first stage; it couples with the first-stage and generates residual signals across multiple scales. The summation of the outputs from the two stages gives the final disparity. As opposed to directly learning the disparity at the second stage, we show that residual learning provides more effective refinement. Moreover, it also benefits the training of the overall cascade network. Experimentation shows that our cascade residual learning scheme provides state-of-the-art performance for matching stereo correspondence. By the time of the submission of this paper, our method ranks first in the KITTI 2015 stereo benchmark, surpassing the prior works by a noteworthy margin.

研究の動機と目的

不等定なステレオ領域（遮蔽物、テクスチャの少ない領域、繰り返しパターン）での視差推定を改善する動機づけ。
高品質な初期視差を生成し、残差学習を用いてそれらを refine する2段階カスケードアーキテクチャを提案。
エンドツーエンド学習可能性を実証し、標準ベンチマーク上で最先端のステレオ手法と比較評価。

提案手法

Stage 1 (DispFulNet): 詳細な境界を伴う全解像度視差を生成するようアップコンボリューションを強化したDispNetC。
Stage 2 (DispResNet): 複数スケールで残差を学習して初期視差を修正するマルチスケール残差ネットワーク、各スケールでの監視付き。
Warping layer: Stage 2への入力として右画像を d1 を用いて微分可能なワーピングで合成左視Viewを作成。
Residual learning: 最終視差は d2 = d1 + 各スケール固有の残差の和、 refinable と安定性を向上。
End-to-end training: 複数スケールのL1損失を跨ぐステージ間で共同最適化。

実験結果

リサーチクエスチョン

RQ12段階カスケードCNNは単一段階ネットワークと比較して ill-posed regions での視差推定を改善できるか？
RQ2複数スケールで残差を監視することは直接的な視差学習よりもより良い refinement を提供するか？
RQ3CRL を用いた場合、標準ステレオベンチマーク（KITTI 2015、FlyingThings3D、Middlebury）での精度と実行時間への影響はどうなるか？

主な発見

CRLはKITTI 2015で最先端の視差推定を達成し、提出時のオンラインリーダーボードで1位にランクされた。
Stage1としてのDispFulNetはStageCのDispNetCよりも細かな視差を示す。
Stage2としてのDispResNetは、複数スケールで残差を refine することでDispNetSを上回る追加の利得を提供。
残差監視を用いたエンドツーエンド訓練は最適化と一般化を向上させ、直接的な視差学習を上回る。
CRLは GTX 1080 上で KITTI 2015 ステレオペアを約0.47秒で処理し、実行時間競争力を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。