QUICK REVIEW

[论文解读] Thermal Image Refinement with Depth Estimation using Recurrent Networks for Monocular ORB-SLAM3

Hürkan Şahin, Huy Xuan Pham|arXiv (Cornell University)|Mar 16, 2026

Robotics and Sensor-Based Localization被引用 0

一句话总结

本论文提出一个轻量级的热图到深度估计管道，含有循环块（T-RefNet 与 RB/RC），实现热像的深度估计与在低光或无GPS场景中的稳健的 ORB-SLAM3 定位，评估在辐射性与非辐射性数据上。

ABSTRACT

Autonomous navigation in GPS-denied and visually degraded environments remains challenging for unmanned aerial vehicles (UAVs). To this end, we investigate the use of a monocular thermal camera as a standalone sensor on a UAV platform for real-time depth estimation and simultaneous localization and mapping (SLAM). To extract depth information from thermal images, we propose a novel pipeline employing a lightweight supervised network with recurrent blocks (RBs) integrated to capture temporal dependencies, enabling more robust predictions. The network combines lightweight convolutional backbones with a thermal refinement network (T-RefNet) to refine raw thermal inputs and enhance feature visibility. The refined thermal images and predicted depth maps are integrated into ORB-SLAM3, enabling thermal-only localization. Unlike previous methods, the network is trained on a custom non-radiometric dataset, obviating the need for high-cost radiometric thermal cameras. Experimental results on datasets and UAV flights demonstrate competitive depth accuracy and robust SLAM performance under low-light conditions. On the radiometric VIVID++ (indoor-dark) dataset, our method achieves an absolute relative error of approximately 0.06, compared to baselines exceeding 0.11. In our non-radiometric indoor set, baseline errors remain above 0.24, whereas our approach remains below 0.10. Thermal-only ORB-SLAM3 maintains a mean trajectory error under 0.4 m.

研究动机与目标

在RGB数据在低光或烟雾环境中失效时，推动可靠的自主导航。
开发一个轻量级的热图到深度管道，获得用于 SLAM 的度量尺度深度。
实现将 refined 热图像和深度直接集成到 ORB-SLAM3，而不需要辐射相机。

提出的方法

引入 T-RefNet 来 refine 原始 16-bit 热像并生成用于 ORB 特征提取的颜色映射图像。
使用轻量级骨干网络（EfficientNet-B0 / MobileNet / ResNet-8）提取多尺度特征。
加入循环块（ConvGRU 或 reservoir computing）以在深度预测中强制时序一致性。
用一个组合损失训练，包括尺度不变对数深度、SSIM、深度排序和边缘感知光滑项。
解码为密集深度图和 refined 热像，供 ORB-SLAM3 使用，以获得度量尺度、时序一致的跟踪。

实验结果

研究问题

RQ1单目热图在辐射性和非辐射性条件下能多好地转化为可靠的深度图？
RQ2 refined 热输入和深度先验是否能在低光或视觉退化场景中提升 ORB-SLAM3 的定位？
RQ3ConvGRU 与 reservoir computing 在热到深度估计中的时序一致性对性能的权衡如何？
RQ4非辐射性训练是否能泛化到真实世界的 UAV 室内实验？

主要发现

在室内黑暗的 VIVID++ 数据上，提出的 RC+Eff-B0 变体实现 AbsRel 0.063 和 RMSE 0.298，a1=0.940。
RC 方法大约使用 5 万参数，提供与基于 ConvGRU 的变体相比竞争力的性能，同时模型更小。
在非辐射性室内数据集上，RC 与 EfficientNet-B0 的组合得到 AbsRel 0.076 和 a1=0.929，展示对非辐射性条件的鲁棒性。
在非辐射性数据上，RGB 训练的深度模型（ZoeDepth、DepthAnything-V2）在没有辐射预处理时表现较差。
通过 T-RefNet refine 的热输入在黑暗和具有挑战性的环境中实现对 ORB-SLAM3 的可靠跟踪，在若干情景中优于原始热像和 RGB 基线。
在 UAV 走廊飞行中的平均定位误差仍低于 0.4 m，表明热图像独立 SLAM 管道的实际鲁棒性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。