QUICK REVIEW

[论文解读] DepthSSC: Monocular 3D Semantic Scene Completion via Depth-Spatial Alignment and Voxel Adaptation

Jiawei Yao, Jusheng Zhang|arXiv (Cornell University)|Nov 28, 2023

Advanced Vision and Imaging被引用 10

一句话总结

DepthSSC 引入 Spatially-Transformed Graph Fusion 和 Geometrically-aware Voxelization，用以对齐空间与深度信息，并为 monocular 3D semantic scene completion 自适应体素分辨率，在 SemanticKITTI 上实现了最先进的结果。

ABSTRACT

The task of 3D semantic scene completion using monocular cameras is gaining significant attention in the field of autonomous driving. This task aims to predict the occupancy status and semantic labels of each voxel in a 3D scene from partial image inputs. Despite numerous existing methods, many face challenges such as inaccurately predicting object shapes and misclassifying object boundaries. To address these issues, we propose DepthSSC, an advanced method for semantic scene completion using only monocular cameras. DepthSSC integrates the Spatial Transformation Graph Fusion (ST-GF) module with Geometric-Aware Voxelization (GAV), enabling dynamic adjustment of voxel resolution to accommodate the geometric complexity of 3D space. This ensures precise alignment between spatial and depth information, effectively mitigating issues such as object boundary distortion and incorrect depth perception found in previous methods. Evaluations on the SemanticKITTI and SSCBench-KITTI-360 dataset demonstrate that DepthSSC not only captures intricate 3D structural details effectively but also achieves state-of-the-art performance.

研究动机与目标

解决单目 3D 语义场景补全（SSC）中的空间-深度对齐问题。
引入将深度图与基于体素的场景表示对齐的机制。
根据几何复杂度动态调整体素分辨率，以在保留细节与提高效率之间取得平衡。
在 SemanticKITTI 及相关基准测试上展示改进的定量性能。

提出的方法

提出 Spatially-Transformed Graph Fusion (ST-GF) 将体素特征转换并融合到图中，使用 STN 进行精确的体素定位，使用 GCN 将特征向回传输到体素。
开发 Geometrically-aware Voxelization (GAV)，根据几何复杂度分配体素分辨率，在复杂区域实现更高分辨率，在其他区域实现更低分辨率。
集成可变形自注意力和可变形跨注意力，将 2D 特征投影到 3D 体素空间并在 3D 中进一步精炼体素特征。
结合多阶段训练目标，包括占据度二元交叉熵、空间连续性损失、语义体素网格损失，以及基于 Hausdorff 距离的几何保持损失。
以 VoxFormer 为基线并通过加入 ST-GF 和 GAV 来形成 DepthSSC 架构。

实验结果

研究问题

RQ1在单目 SSC 中，ST-GF 是否能改善深度派生信息与基于体素的场景表示之间的对齐？
RQ2Geometrically-aware Voxelization 是否能在几何复杂区域提升重建细节，同时不带来过于昂贵的计算开销？
RQ3相比基于 RGB 的单目 SSC 方法，DepthSSC 在 SemanticKITTI 和 SSCBench-KITTI-360 上的表现如何？
RQ4消融研究显示 ST-GF 和 GAV 对整体性能的贡献有哪些？

主要发现

DepthSSC 在 SemanticKITTI 测试集上（RGB 单目输入）达到 IoU 44.58% 和 mIoU 13.11%。
在 SemanticKITTI 验证集上，DepthSSC 达到 IoU 45.84% 和 mIoU 13.28%。
消融实验表明 ST-GF 和 GAV 均对 VoxFormer 基线提供一致的叠加增益。
ST-GF 提供了深度图与体素查询之间的空间对齐改进，而 GAV 在几何复杂区域捕捉细节方面表现更佳。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。