[论文解读] Hierarchical Neural Architecture Search for Deep Stereo Matching
LEAStereo 引入了一种端到端的分层神经架构搜索,针对立体匹配定制,在几何信息驱动的流水线中联合优化 2D 特征网络和 3D 匹配网络,参数显著更少、推理更快,同时达到顶级基准。
To reduce the human efforts in neural network design, Neural Architecture Search (NAS) has been applied with remarkable success to various high-level vision tasks such as classification and semantic segmentation. The underlying idea for the NAS algorithm is straightforward, namely, to enable the network the ability to choose among a set of operations (e.g., convolution with different filter sizes), one is able to find an optimal architecture that is better adapted to the problem at hand. However, so far the success of NAS has not been enjoyed by low-level geometric vision tasks such as stereo matching. This is partly due to the fact that state-of-the-art deep stereo matching networks, designed by humans, are already sheer in size. Directly applying the NAS to such massive structures is computationally prohibitive based on the currently available mainstream computing resources. In this paper, we propose the first end-to-end hierarchical NAS framework for deep stereo matching by incorporating task-specific human knowledge into the neural architecture search framework. Specifically, following the gold standard pipeline for deep stereo matching (i.e., feature extraction -- feature volume construction and dense matching), we optimize the architectures of the entire pipeline jointly. Extensive experiments show that our searched network outperforms all state-of-the-art deep stereo matching architectures and is ranked at the top 1 accuracy on KITTI stereo 2012, 2015 and Middlebury benchmarks, as well as the top 1 on SceneFlow dataset with a substantial improvement on the size of the network and the speed of inference. The code is available at https://github.com/XuelianCheng/LEAStereo.
研究动机与目标
- 推动在立体匹配网络架构中减少人工设计工作量。
- 将任务特定的立体知识融入 NAS,以在体积化流水线内搜索特征网和匹配网。
- 开发一个端到端的搜索框架,在单元和网络层面上联合优化特征网和匹配网。
- 证明所搜索的架构在显著更小的模型和更快推理下仍达到最先进的精度。
提出的方法
- 提出两级分层 NAS:在单元层对特征网和匹配网进行搜索,在网络层对整个格栅内的架构布置进行搜索。
- 采用残差单元设计以增强信息流动,并在单元之间实现可变的空间分辨率。
- 为 2D 特征网定义候选操作集合(3x3 卷积、跳跃连接),为 3D 匹配网定义候选操作集合(3x3x3 卷积、跳跃连接)。
- 采用带架构参数 (alpha, beta) 和网络权重 (w) 的双层优化;对训练集进行交替更新,应用受 DARTS 启发的一阶松弛。
- 通过 soft-argmin 将最终代价体投影到视差,损失基于 smooth L1;在 SceneFlow 上端到端训练,并在 KITTI 和 Middlebury 上进行微调。
实验结果
研究问题
- RQ1是否能通过利用任务特定先验,将端到端 NAS 有效应用于完整的体积化立体匹配流水线?
- RQ2在准确性和效率方面,同时对特征子网和匹配子网进行联合搜索,是否优于分别搜索?
- RQ3单元设计(残差 vs 直接)和操作集合对立体性能和模型大小的影响是什么?
- RQ4与手工设计和 NAS 基线相比,所发现的架构在标准立体基准(SceneFlow、KITTI、Middlebury)上的泛化能力如何?
主要发现
- LEAStereo 在 SceneFlow 上取得最先进的准确度,参数量约为以往方法的三分之一。
- 在 KITTI 2012 和 2015 上,LEAStereo 在人工设计架构中排名第一。
- 在 Middlebury 2014 上,在多项评价指标中达到领先水平。
- 与可比的 NAS 和人工设计网络相比,该模型在参数利用率方面显著更优,推理更快(0.3 s)。
- 特征网和匹配网的联合搜索在 EPE 方面更好且参数数量更小,相较于单独搜索。
- 残差单元优于直接单元,在参数和 FLOPs 适度增加的情况下提升精度。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。