QUICK REVIEW

[論文レビュー] Thermal Image Refinement with Depth Estimation using Recurrent Networks for Monocular ORB-SLAM3

Hürkan Şahin, Huy Xuan Pham|arXiv (Cornell University)|Mar 16, 2026

Robotics and Sensor-Based Localization被引用数 0

ひとこと要約

この論文は、再帰ブロック（T-RefNetとRB/RC）を備えた軽量な熱画像から深度推定パイプラインを提案し、熱画像のみの深度推定と低照度またはGPS非利用環境でのORB-SLAM3ローカライゼーションを堅牢化します。評価は放射計測データと非放射計測データで行われています。

ABSTRACT

Autonomous navigation in GPS-denied and visually degraded environments remains challenging for unmanned aerial vehicles (UAVs). To this end, we investigate the use of a monocular thermal camera as a standalone sensor on a UAV platform for real-time depth estimation and simultaneous localization and mapping (SLAM). To extract depth information from thermal images, we propose a novel pipeline employing a lightweight supervised network with recurrent blocks (RBs) integrated to capture temporal dependencies, enabling more robust predictions. The network combines lightweight convolutional backbones with a thermal refinement network (T-RefNet) to refine raw thermal inputs and enhance feature visibility. The refined thermal images and predicted depth maps are integrated into ORB-SLAM3, enabling thermal-only localization. Unlike previous methods, the network is trained on a custom non-radiometric dataset, obviating the need for high-cost radiometric thermal cameras. Experimental results on datasets and UAV flights demonstrate competitive depth accuracy and robust SLAM performance under low-light conditions. On the radiometric VIVID++ (indoor-dark) dataset, our method achieves an absolute relative error of approximately 0.06, compared to baselines exceeding 0.11. In our non-radiometric indoor set, baseline errors remain above 0.24, whereas our approach remains below 0.10. Thermal-only ORB-SLAM3 maintains a mean trajectory error under 0.4 m.

研究の動機と目的

RGBデータが低照度や煙霧環境で機能しない場合の信頼できる自律走行の動機付け。
SLAMのメートル尺度の深度を生み出す軽量な熱→深度パイプラインの開発。
改良済み熱画像と深度をORB-SLAM3に直接統合し、放射計測カメラなしで動作させる。

提案手法

生データ16-bit熱画像を refine し、ORB特徴抽出に用いるカラーマップ画像を生成するT-RefNetを導入。
軽量なバックボーン（EfficientNet-B0 / MobileNet / ResNet-8）でマルチスケール特徴を抽出。
ConvGRUまたは貯蔵器具（ reservoir computing ）を用いた再帰ブロックで深度推定の時刻的一貫性を強制。
スケール不変対数深度、SSIM、深度順序、エッジ意識スムージングを組み合わせた複合損失で学習。
デンス深度マップと改良熱画像をデコードし、ORB-SLAM3に供給してメートル尺度・時系列整合性のある追跡を実現。

実験結果

リサーチクエスチョン

RQ1放射計測・非放射計測条件下で、モノクロ熱画像を信頼性の高い深度マップへどれだけ変換できるか？
RQ2改良済み熱入力と深度事前知が低照度や視覚的に劣化したシナリオでORB-SLAM3のローカライゼーションを改善できるか？
RQ3ConvGRU対Reservoir Computingの時刻的一貫性のトレードオフはどの程度か？
RQ4非放射計測トレーニングは実世界のUAV室内実験へ一般化するか？

主な発見

モデル	AbsRel	RMSE	a1	a2	a3
Shin (T)	0.232	0.740	0.618	0.907	0.987
Shin (MS)	0.166	0.566	0.768	0.967	0.994
Shin (Max.)	0.149	0.517	0.813	0.969	0.994
ZoeDepth	0.165	0.533	0.788	0.944	0.991
DepthAnything-V2	0.112	0.378	0.902	0.970	0.990
Ye et al.	0.145	0.499	0.827	0.969	0.994
MSDFNet	0.139	0.470	0.847	0.980	0.996
Eff-B0 noRB	0.139	0.497	0.839	0.945	0.984
Eff-B0+GRU noTRN	0.079	0.325	0.929	0.980	0.995
ResNet8+GRU	0.079	0.345	0.913	0.970	0.990
MobileNet+GRU	0.072	0.318	0.928	0.977	0.993
Eff-B0+GRU	0.063	0.298	0.940	0.980	0.993
Eff-B0+RC	0.069	0.313	0.931	0.976	0.993
Shin (Max.)	0.262	1.273	0.589	0.890	0.960
ZoeDepth	0.243	1.110	0.605	0.885	0.954
DepthAnything-V2	0.267	1.043	0.571	0.863	0.931
ResNet8+GRU	0.109	0.516	0.886	0.943	0.969
MobileNet+GRU	0.085	0.453	0.911	0.951	0.971
Eff-B0+GRU	0.079	0.424	0.920	0.955	0.971
Eff-B0+RC	0.076	0.439	0.929	0.965	0.981

室内暗所のVIVID++データで、提案されたRC+Eff-B0バリアントはAbsRel 0.063、RMSE 0.298、a1=0.940を達成。
RCアプローチは約5万パラメータを使用し、ConvGRUベースのバリアントよりもはるかに小さなモデルで競争力のある性能を提供。
非放射計測の室内データセットで、EfficientNet-B0を用いたRCはAbsRel 0.076、a1=0.929を示し、非放射計測条件への頑健性を示す。
放射計測前処理なしではRGBで学習した深度モデル（ZoeDepth、DepthAnything-V2）は性能が低い。
T-RefNetにより改良された熱入力は暗所・難所の環境で信頼性あるORB-SLAM3追跡を可能にし、RAW熱画像やRGBベースラインよりもいくつかのシナリオで上回る。
UAV廊下飛行時の平均ローカライズ誤差は0.4 m未満を維持しており、熱のみSLAMパイプラインの実用的な頑健性を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。