QUICK REVIEW

[论文解读] Visual Domain Adaptation for Monocular Depth Estimation on Resource-Constrained Hardware

Julia Hornauer, Lazaros Nalpantidis|arXiv (Cornell University)|Aug 5, 2021

Advanced Vision and Imaging参考文献 33被引用 4

一句话总结

本文首次对在资源受限硬件上针对单目深度估计的视觉域自适应任务，开展深度神经网络训练的可行性进行了研究。提出了一种适用于边缘设备的对抗性学习方法，表明仅通过轻量级模型架构和少量目标域数据（100–1000 个样本），即可实现有意义的域自适应，从而在低功耗下实现实时推理。

ABSTRACT

Real-world perception systems in many cases build on hardware with limited resources to adhere to cost and power limitations of their carrying system. Deploying deep neural networks on resource-constrained hardware became possible with model compression techniques, as well as efficient and hardware-aware architecture design. However, model adaptation is additionally required due to the diverse operation environments. In this work, we address the problem of training deep neural networks on resource-constrained hardware in the context of visual domain adaptation. We select the task of monocular depth estimation where our goal is to transform a pre-trained model to the target's domain data. While the source domain includes labels, we assume an unlabelled target domain, as it happens in real-world applications. Then, we present an adversarial learning approach that is adapted for training on the device with limited resources. Since visual domain adaptation, i.e. neural network training, has not been previously explored for resource-constrained hardware, we present the first feasibility study for image-based depth estimation. Our experiments show that visual domain adaptation is relevant only for efficient network architectures and training sets at the order of a few hundred samples. Models and code are publicly available.

研究动机与目标

研究在资源受限硬件上对单目深度估计进行设备端训练以实现视觉域自适应的可行性。
解决在无法访问真实深度标签的新、未标注环境中部署预训练模型的挑战。
评估在设备端自适应过程中，模型复杂度、训练集大小、推理速度和功耗之间的权衡。
证明对抗性域自适应在嵌入式系统（如 NVIDIA Jetson Nano 和 Raspberry Pi）上是可行的。

提出的方法

将对抗性学习框架用于域自适应，以在计算能力和内存受限的边缘硬件上高效运行。
采用轻量级网络架构（FastDepth）与复杂基线模型（ResNet-UpProj）进行对比评估。
使用对抗性损失在未标注的目标域图像上训练模型，以对齐源域与目标域之间的特征分布。
采用样本级中值缩放方法进行深度预测评估，如先前工作所述，以确保公平比较。
测量训练时间、功耗和推理延迟，以评估在嵌入式平台上的实用性。
在室内（vKITTI → KITTI）和室外（KITTI → KITTI）域自适应场景下开展实验，测试不同输入分辨率和数据集大小。

实验结果

研究问题

RQ1能否在资源受限的嵌入式硬件上有效执行单目深度估计的对抗性域自适应？
RQ2在边缘设备上实现有意义的性能提升，所需的最少未标注目标域样本数量是多少？
RQ3模型复杂度如何影响设备端自适应过程中的训练时间、功耗和推理速度？
RQ4在 NVIDIA Jetson Nano 等嵌入式设备上直接训练深度神经网络用于感知任务是否可行？

主要发现

仅当使用轻量级网络架构（如 FastDepth）时，嵌入式硬件上的域自适应才可行；而复杂模型（如 ResNet-UpProj）因内存限制无法训练。
使用 500–1000 个目标域样本进行训练，可在性能提升、训练时间和能效之间取得最佳平衡。
FastDepth 模型在 Jetson Nano 上实现最低 10 毫秒的推理时间，支持实时性能。
尽管模型复杂度不同，但各类架构的功耗保持较低且相近。
视觉结果表明，自适应后深度图质量明显提升，尤其在轻量级模型上，对象边界勾勒更清晰。
更高分辨率（288x704）虽增加训练时间，但对功耗和推理速度影响不大。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。