QUICK REVIEW

[论文解读] ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth

Shariq Farooq Bhat, Reiner Birkl|arXiv (Cornell University)|Feb 23, 2023

Advanced Vision and Imaging被引用 168

一句话总结

ZoeDepth 将相对深度预训练与度量深度头和自动路由相结合，在室内外数据集上实现强烈的度量深度性能和前所未有的零样本泛化。

ABSTRACT

This paper tackles the problem of depth estimation from a single image. Existing work either focuses on generalization performance disregarding metric scale, i.e. relative depth estimation, or state-of-the-art results on specific datasets, i.e. metric depth estimation. We propose the first approach that combines both worlds, leading to a model with excellent generalization performance while maintaining metric scale. Our flagship model, ZoeD-M12-NK, is pre-trained on 12 datasets using relative depth and fine-tuned on two datasets using metric depth. We use a lightweight head with a novel bin adjustment design called metric bins module for each domain. During inference, each input image is automatically routed to the appropriate head using a latent classifier. Our framework admits multiple configurations depending on the datasets used for relative depth pre-training and metric fine-tuning. Without pre-training, we can already significantly improve the state of the art (SOTA) on the NYU Depth v2 indoor dataset. Pre-training on twelve datasets and fine-tuning on the NYU Depth v2 indoor dataset, we can further improve SOTA for a total of 21% in terms of relative absolute error (REL). Finally, ZoeD-M12-NK is the first model that can jointly train on multiple datasets (NYU Depth v2 and KITTI) without a significant drop in performance and achieve unprecedented zero-shot generalization performance to eight unseen datasets from both indoor and outdoor domains. The code and pre-trained models are publicly available at https://github.com/isl-org/ZoeDepth .

研究动机与目标

解决度量深度模型过拟合于单一数据集、缺乏跨域泛化能力的局限性。
利用相对深度预训练在微调时保持度量深度的同时提升泛化能力。
开发轻量级的、面向域的度量深度头（度量箱模块）以及在推断时选择合适头部的自动路由机制。
在 NYU Depth v2 和 KITTI 上展示改进的最新状态下的性能，并展示对八个未见数据集的强零样本泛化能力。

提出的方法

两阶段框架：首先使用 MiDaS 策略对用于相对深度（RDE）的通用编码器-解码器进行预训练；然后加入度量深度头并在度量深度数据集上进行微调。
引入带有吸引子层的度量箱模块（MBM），用于预测逐像素深度箱中心并与逐像素箱概率结合以输出度量深度。
用二项有序概率模型替代标准箱预测以尊重深度顺序并提高稳定性。
利用多尺度解码器特征通过逆吸引子层细化箱中心，使箱中心朝向学习到的吸引子移动而非简单分裂箱。
自动路由：在推断时，通过在编码器特征上训练的潜在分类器将每张图像路由到合适的度量头；支持单头或多头（室内/室外）配置。
使用尺度不变像素损失进行监督训练；由于内存原因不使用 Chamfer 损失，聚焦于像素级的尺度不变损失。

实验结果

研究问题

RQ1一个在相对深度上进行预训练的单模型，能否在不牺牲度量准确性的前提下，在多个领域（室内/室外）实现对度量深度的泛化？
RQ2一个轻量级的、面向域的度量头（含 MBM 和吸引子）是否能在保持跨域泛化的同时有效恢复度量尺度？
RQ3将自动路由路由到域特定头部如何影响对未见数据集的零样本泛化？

主要发现

ZoeD-X-N，在没有相对深度预训练的情况下，已超越 NYU Depth v2 的最新方法（REL 相对于 NeWCRFs 提升 13.7%）。
ZoeD-M12-N（在 12 个数据集上进行相对预训练 + 在 NYU 上进行度量微调）实现了大约 21% 的 REL 相对于此前最优的改进，在 NYU Depth v2 上。
ZoeD-M12-NK（在 NYU 和 KITTI 上进行多域微调并路由到室内/室外头部）在总 REL 上比 NeWCRFs 提高 24.3%，并在 8 个未见数据集上获得强烈的零样本结果。
对未见室内数据集的零样本泛化在 mRIθ 方面最高达到 46.3%（如 DIODE Indoor），并且表现持续优于先前方法。
对未见的室外数据集的零样本泛化在 DIML Outdoor 上达到最高 976.4% mRIθ，在 Virtual KITTI 2、DDAD 等其他数据集上也有出色结果。
该方法展示了跨室内外数据集训练时的鲁棒多域训练能力，性能不会出现大幅下降。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。