QUICK REVIEW

[论文解读] Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue

Ravi Garg, Vijay Kumar Bg|arXiv (Cornell University)|Mar 16, 2016

Advanced Vision and Imaging参考文献 27被引用 319

一句话总结

本文提出了一种完全无监督的CNN，通过使用预测的视差从右图重建左图，从而使从单视图深度图预测学习成立，实现端到端训练并且不需要真实深度。

ABSTRACT

A significant weakness of most current deep Convolutional Neural Networks is the need to train them using vast amounts of manu- ally labelled data. In this work we propose a unsupervised framework to learn a deep convolutional neural network for single view depth predic- tion, without requiring a pre-training stage or annotated ground truth depths. We achieve this by training the network in a manner analogous to an autoencoder. At training time we consider a pair of images, source and target, with small, known camera motion between the two such as a stereo pair. We train the convolutional encoder for the task of predicting the depth map for the source image. To do so, we explicitly generate an inverse warp of the target image using the predicted depth and known inter-view displacement, to reconstruct the source image; the photomet- ric error in the reconstruction is the reconstruction loss for the encoder. The acquisition of this training data is considerably simpler than for equivalent systems, requiring no manual annotation, nor calibration of depth sensor to camera. We show that our network trained on less than half of the KITTI dataset (without any further augmentation) gives com- parable performance to that of the state of art supervised methods for single view depth estimation.

研究动机与目标

推动单视图深度估计的无监督学习，以避免昂贵的深度标注。
提出一个基于立体成对的自编码器，其中CNN预测的深度图用于将右图变换回左图以实现重建。
在KITTI数据集上从零开始进行端到端的训练，且不依赖地面实测深度数据。
展示粗到细的训练策略与跳跃连接在深度预测质量上的提升。
与最先进的有监督方法进行比较，并分析数据增强和微调带来的收益。

提出的方法

使用具有已知相机运动的立体对来训练一个CNN以预测源图像（左图）的深度图。
使用预测的深度和已知视差对右图进行向后采样/扭曲以重建左图；优化一个光度重建损失。
对视差应用一个简单的光滑先验以解决光圈问题。
采用带跳跃连接的粗到细架构，在不同分辨率上细化深度预测。
通过Taylor展开线性化扭曲以实现反向传播，并在多个训练阶段进行迭代精化。
使用多阶段上采样（L7 到 L12）进行训练，随后通过数据增强（颜色、尺度、翻转）进行微调。

实验结果

研究问题

RQ1是否可以在无监督条件下从头训练一个CNN，通过立体几何从单视图预测深度？
RQ2基于自编码器样扭曲的光度重建损失是否在没有地面实测深度的情况下提供具有竞争力的深度预测？
RQ3在无监督设置中，粗到细训练和跳跃连接对深度准确度有何影响？
RQ4在KITTI上，无监督方法与有监督的单视图深度方法以及基于立体的基线相比如何？
RQ5数据增强和微调是否能够提升无监督单视图深度估计的性能？

主要发现

在KITTI上，基于立体对进行训练的无监督CNN在深度预测方面与最先进的有监督方法相竞争。
带跳跃连接的粗到细训练在更高分辨率下尤其能产生更好的深度图。
数据增强和后期微调进一步提高了边缘定位和整体深度精度。
该方法完全无监督且无需初始化，能够在没有地面真实深度数据的情况下接近有监督的性能。
与 stereo-to-CNN 基线相比，自动编码器方法避免了从代理地面真值视差中学习的偏差，并减少了对象边缘处的深度误差。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。