QUICK REVIEW

[論文レビュー] Hierarchical Neural Architecture Search for Deep Stereo Matching

Xuelian Cheng, Yiran Zhong|arXiv (Cornell University)|Oct 26, 2020

Advanced Vision and Imaging参考文献 31被引用数 229

ひとこと要約

LEAStereoは視差マッチングに特化したエンドツーエンドの階層的ニューラルアーキテクチャ探索を導入し、幾何情報を組み込んだパイプライン内で2D特徴ネットと3Dマッチングネットを共同最適化し、はるかに少ないパラメータと高速推論で最先端のベンチマークを達成する。

ABSTRACT

To reduce the human efforts in neural network design, Neural Architecture Search (NAS) has been applied with remarkable success to various high-level vision tasks such as classification and semantic segmentation. The underlying idea for the NAS algorithm is straightforward, namely, to enable the network the ability to choose among a set of operations (e.g., convolution with different filter sizes), one is able to find an optimal architecture that is better adapted to the problem at hand. However, so far the success of NAS has not been enjoyed by low-level geometric vision tasks such as stereo matching. This is partly due to the fact that state-of-the-art deep stereo matching networks, designed by humans, are already sheer in size. Directly applying the NAS to such massive structures is computationally prohibitive based on the currently available mainstream computing resources. In this paper, we propose the first end-to-end hierarchical NAS framework for deep stereo matching by incorporating task-specific human knowledge into the neural architecture search framework. Specifically, following the gold standard pipeline for deep stereo matching (i.e., feature extraction -- feature volume construction and dense matching), we optimize the architectures of the entire pipeline jointly. Extensive experiments show that our searched network outperforms all state-of-the-art deep stereo matching architectures and is ranked at the top 1 accuracy on KITTI stereo 2012, 2015 and Middlebury benchmarks, as well as the top 1 on SceneFlow dataset with a substantial improvement on the size of the network and the speed of inference. The code is available at https://github.com/XuelianCheng/LEAStereo.

研究の動機と目的

視差マッチングネットワークアーキテクチャの設計作業負荷を減らすことを動機づける。
NASにタスク固有の視差知識を組み込み、ボリューミックパイプライン内で特徴ネットとマッチング nets を探索する。
セルレベルとネットワークレベルの両方で特徴ネットとマッチングネットを共同最適化するエンドツーエンドの探索フレームワークを開発する。
探索されたアーキテクチャが、はるかに小さなモデルと高速推論で最先端の精度を達成することを示す。

提案手法

特徴ネットとマッチング nets のセルレベル探索と、格子状構造全体のアーキテクチャ配置をネットワークレベルで探索する、2レベルの階層型NASを提案する。
情報フローを強化し、セル間で可変の空間解像度を許すための残差セル設計を採用する。
2D特徴ネットには3x3畳み込み、スキップを、3Dマッチングネットには3x3x3畳み込み、スキップをそれぞれ候補操作セットとして定義する。
アーキテクチャパラメータ(alpha, beta)とネットワーク重み(w)を用いた.bi-level最適化を採用し、訓練セットでの交互更新を伴う1次のDARTS風リラクゼーションを適用する。
最終コストボリュームをソフトargminを介して視差へ射影し、滑らかなL1に基づく損失でEnd-to-End訓練をSceneFlowで実行し、KITTIとMiddleburyで微調整する。

実験結果

リサーチクエスチョン

RQ1エンドツーエンドのNASを、タスク固有の priors を活用して完全な体積視差パイプラインへ効果的に適用できるか。
RQ2特徴サブネットとマッチングサブネットの両方を同時探索することは、精度と効率の点で個別探索より優れているか。
RQ3セル設計（残差 vs 直接）と操作セットが視差性能とモデルサイズに及ぼす影響はどうか。
RQ4発見されたアーキテクチャは、SceneFlow、KITTI、Middleburyといった標準的な視差ベンチマーク間で、人手設計およびNASベースラインと比べて一般化できるか。

主な発見

LEAStereoはSceneFlowで最先端の精度を達成し、従来手法の約1/3のパラメータで実現。
KITTI 2012および2015では、LEAStereoは人手設計アーキテクチャのトップ1位にランクする。
Middlebury 2014では、LEAStereoは複数の評価指標でリーディングを達成。
同等のNASおよび手作業のネットと比較して、パラメータ効率が大幅に高く推論が高速（0.3 s）。
Feature NetとMatching Netの共同探索は、別個の探索よりEPEが改善され、パラメータ数も削減。
残差セルは直接セルより優れており、パラメータとFLOPsの適度な増加で精度を向上させる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。