QUICK REVIEW

[论文解读] SpaceSense-Bench: A Large-Scale Multi-Modal Benchmark for Spacecraft Perception and Pose Estimation

Aodi Wu, J. X. Zuo|arXiv (Cornell University)|Mar 10, 2026

Space Satellite Systems and Control被引用 0

一句话总结

SpaceSense-Bench 引入一个大规模、多模态的航天器感知基准，包含 136 个卫星模型、同步的 RGB/深度/LiDAR 数据、密集的 7 类部件语义，以及 6-DoF 地面实况位姿，基准五个感知任务并分析零-shot 泛化和数据规模效应。

ABSTRACT

Autonomous space operations such as on-orbit servicing and active debris removal demand robust part-level semantic understanding and precise relative navigation of target spacecraft, yet collecting large-scale real data in orbit remains impractical due to cost and access constraints. Existing synthetic datasets, moreover, suffer from limited target diversity, single-modality sensing, and incomplete ground-truth annotations. We present extbf{SpaceSense-Bench}, a large-scale multi-modal benchmark for spacecraft perception encompassing 136~satellite models with approximately 70~GB of data. Each frame provides time-synchronized 1024$ imes$1024 RGB images, millimeter-precision depth maps, and 256-beam LiDAR point clouds, together with dense 7-class part-level semantic labels at both the pixel and point level as well as accurate 6-DoF pose ground truth. The dataset is generated through a high-fidelity space simulation built in Unreal Engine~5 and a fully automated pipeline covering data acquisition, multi-stage quality control, and conversion to mainstream formats. We benchmark five representative tasks (object detection, 2D semantic segmentation, RGB--LiDAR fusion-based 3D point cloud segmentation, monocular depth estimation, and orientation estimation) and identify two key findings: (i)~perceiving small-scale components (\emph{e.g.}, thrusters and omni-antennas) and generalizing to entirely unseen spacecraft in a zero-shot setting remain critical bottlenecks for current methods, and (ii)~scaling up the number of training satellites yields substantial performance gains on novel targets, underscoring the value of large-scale, diverse datasets for space perception research. The dataset, code, and toolkit are publicly available at https://github.com/wuaodi/SpaceSense-Bench.

研究动机与目标

解决缺乏多样化、并且多模态、密集标注的航天器数据集，以实现自主太空作业中的鲁棒感知与姿态估计。
提供可扩展的基于仿真的流水线，在多种卫星几何形状下生成光感真实、时间同步的传感器数据。
实现对多种感知任务的评测，并对未见目标具有零-shot 泛化能力。
量化数据集规模对跨目标泛化的影响，并识别小部件识别的持续瓶颈。

提出的方法

创建包含 136 个卫星模型的大型 3D 资产库，采用七类部件分类学。
在 Unreal Engine 5 中构建高保真的太空场景，并与 AirSim 集成，实现 RGB、深度和 LiDAR 的同步感知。
通过轨迹规划（近接与轨道）自动化数据生成，并自动提取地面真值（RGB、深度、LiDAR、7 类掩码、6-DoF 位姿）。
将输出转换为主流格式（YOLO、MMSegmentation、SemanticKITTI），便于直接用于检测、分割与三维感知任务。
使用多种基线方法和零-shot 协议，对五个任务进行系统性基准测试。

实验结果

研究问题

RQ1当前感知方法在零-shot 设置下对未见航天器几何形状的泛化能力如何？
RQ2增加训练多样性（更多卫星几何形状）对新目标的零-shot 泛化有何影响？
RQ3RGB、深度和 LiDAR 对在近似太空条件下的多模态感知有何贡献？
RQ4在航天器中小部件识别的持续瓶颈是什么（如推进器、全向天线）在各任务中如何表现？

主要发现

小型部件（如全向天线与推力器）在强模型下 IoU 均值低于 35%，凸显核心小目标感知挑战。
各类像素分布存在明显的长尾现象，某些部件（solar_panel、main_body）占比主导，而小部件仍然困难。
带有深度与方向基础的零-shot 结果在逐像素/距离上表现出色，但跨目标深度与位姿泛化有限。
扩大训练卫星数量在零-shot 的 mIoU 方面带来显著提升（相对提升最高可达 73%），mAcc 提升最高可达 63%，回报尚未饱和。
PMFNet（RGB+LiDAR）在 3D 点云分割中实现 42.4% mIoU，表明多模态融合的有效性。
Depth Anything V2 在零-shot 深度中的 AbsRel 约为 0.022–0.023，但 Spearman 相关性仍然适度（≈0.55–0.60），表明在此设置中的相对深度排序能力有限。
Orient Anything 的姿态估计在平均轴角误差约为 12.75°，多数帧低于 20°，但在几何形状间存在较大方差。
数据集规模研究证实，规模更大、更具多样性的库可以提升零-shot 泛化，并且进一步扩展规模可能继续带来提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。