QUICK REVIEW

[論文レビュー] OccDepth: A Depth-Aware Method for 3D Semantic Scene Completion

Ruihang Miao, Weizhou Liu|arXiv (Cornell University)|Feb 27, 2023

Advanced Vision and Imaging被引用数 38

ひとこと要約

OccDepth は Stereo-SFA を用いて深度対応特徴を融合し、depth distillation を用いた Occupancy Aware Depth (OAD) モジュールを組み込んだ、3D セマンティックシーン完成 (SSC) を視覚のみの SSC メソッドの中で最先端風の結果を示す、3D セマンティックシーン完成方法として初のステレオ RGB-inferred 手法です。

ABSTRACT

3D Semantic Scene Completion (SSC) can provide dense geometric and semantic scene representations, which can be applied in the field of autonomous driving and robotic systems. It is challenging to estimate the complete geometry and semantics of a scene solely from visual images, and accurate depth information is crucial for restoring 3D geometry. In this paper, we propose the first stereo SSC method named OccDepth, which fully exploits implicit depth information from stereo images (or RGBD images) to help the recovery of 3D geometric structures. The Stereo Soft Feature Assignment (Stereo-SFA) module is proposed to better fuse 3D depth-aware features by implicitly learning the correlation between stereo images. In particular, when the input are RGBD image, a virtual stereo images can be generated through original RGB image and depth map. Besides, the Occupancy Aware Depth (OAD) module is used to obtain geometry-aware 3D features by knowledge distillation using pre-trained depth models. In addition, a reformed TartanAir benchmark, named SemanticTartanAir, is provided in this paper for further testing our OccDepth method on SSC task. Compared with the state-of-the-art RGB-inferred SSC method, extensive experiments on SemanticKITTI show that our OccDepth method achieves superior performance with improving +4.82% mIoU, of which +2.49% mIoU comes from stereo images and +2.33% mIoU comes from our proposed depth-aware method. Our code and trained models are available at https://github.com/megvii-research/OccDepth.

研究の動機と目的

ステレオ画像からの暗黙の深度を活用することで、より安価な視覚のみの入力からの 3D セマンティックシーン完成 (SSC) の改善を動機づける。
深度認識融合を用いて2D特徴を3D占有空間へ持ち上げる、ステレオベースの SSC パイプラインを導入する。
3D特徴へ明示的な深度事前情報を注入するための occupancy-aware depth モジュールと depth distillation を開発する。
室内シーンにおけるステレオ入力 SSC を評価する新しい SemanticTartanAir ベンチマークを提供する。
RGB ベースのベースラインに対する改善を経験的に示し、2.5D/3D-入力 SSC 手法との競争力を示す。）

提案手法

Stereo Soft Feature Assignment (Stereo-SFA) を用いて、左視と右視の学習された相関を利用して2Dステレオ特徴を3Dボクセル空間へ融合する。
Occupancy Aware Depth (OAD) モジュールは深度分布を予測し、それを可微分なグリッドサンプリングを介してボクセル空間の占有事前情報へ変換し、3D特徴を洗練させる。
LEAStereo というステレオ深度ネットワークを用いて訓練中に深度蒸留を行い、深度予測を監督し、F_D を密な地上真実風の深度マップと一致させる。
二つの損失タスク設計: 幾何（占有）損失と意味論的損失を分離し、訓練を安定化させるモノベースの正則化項。
過学習を抑えるコツ: 2D バックボーンの事前訓練、データ拡張、および徐々に減衰する意味論的損失の重み。
SemanticKITTI、NYUv2、および SemanticTartanAir での評価を通じて、ステレオベースの SSC の有効性を示す。）

実験結果

リサーチクエスチョン

RQ1ステレオ（視覚のみ）入力は RGB のみの手法よりも SSC のための密な 3D ジオメトリと意味情報をより効果的に回復できるか？
RQ2OAD と depth distillation を介した明示的な深度は、SSC における 3D occupancy および意味予測をどれだけ改善するか？
RQ3簡易な融合戦略と比較した場合、3D特徴のリフティングにおける Stereo-SFA の貢献は何か？
RQ4OccDepth は室内対屋外の SSC ベンチマーク、および新しい SemanticTartanAir ベースのデータセットでどの程度性能を示すか？

主な発見

OccDepth は SemanticKITTI および SemanticTartanAir ベンチマークで、視覚のみの SSC 手法の中で卓越した性能を達成する。
+4.82% の mIoU 改善、RGB ベースの SSC ベースラインに対して、ステレオ入力から +2.49% mIoU、深度認識コンポーネントから +2.33% mIoU。
Stereo-SFA は mean や concatenation 融合に比べて顕著な利得をもたらし、特に 3D シーン完成 IoU を向上させる。
OAD は計算負荷を最小限に抑えつつ有意な mIoU 増加をもたらし、深度蒸留は深度ガイダンスをさらに改善する。
OccDepth は stereo RGB のみ（訓練時にオプションの深度を使用）で、2.5D/3D-input SSC メソッドと競合可能である。
定性的な結果は、室内外のシーンで薄く遠い物体の回復と幾何学的エッジの鮮明さの向上を示している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。