QUICK REVIEW

[論文レビュー] SpaceSense-Bench: A Large-Scale Multi-Modal Benchmark for Spacecraft Perception and Pose Estimation

Aodi Wu, J. X. Zuo|arXiv (Cornell University)|Mar 10, 2026

Space Satellite Systems and Control被引用数 0

ひとこと要約

SpaceSense-Bench は 136 機 satellite モデル、同期 RGB/depth/LiDAR データ、密集した 7 クラス部品意味論、6-DoF グラウンドトゥルース姿勢を備え、5 つの認識タスクをベンチマークし、ゼロショット一般化とデータ規模効果を分析する大規模多モーダル宇宙機認識ベンチマークである。

ABSTRACT

Autonomous space operations such as on-orbit servicing and active debris removal demand robust part-level semantic understanding and precise relative navigation of target spacecraft, yet collecting large-scale real data in orbit remains impractical due to cost and access constraints. Existing synthetic datasets, moreover, suffer from limited target diversity, single-modality sensing, and incomplete ground-truth annotations. We present extbf{SpaceSense-Bench}, a large-scale multi-modal benchmark for spacecraft perception encompassing 136~satellite models with approximately 70~GB of data. Each frame provides time-synchronized 1024$ imes$1024 RGB images, millimeter-precision depth maps, and 256-beam LiDAR point clouds, together with dense 7-class part-level semantic labels at both the pixel and point level as well as accurate 6-DoF pose ground truth. The dataset is generated through a high-fidelity space simulation built in Unreal Engine~5 and a fully automated pipeline covering data acquisition, multi-stage quality control, and conversion to mainstream formats. We benchmark five representative tasks (object detection, 2D semantic segmentation, RGB--LiDAR fusion-based 3D point cloud segmentation, monocular depth estimation, and orientation estimation) and identify two key findings: (i)~perceiving small-scale components (\emph{e.g.}, thrusters and omni-antennas) and generalizing to entirely unseen spacecraft in a zero-shot setting remain critical bottlenecks for current methods, and (ii)~scaling up the number of training satellites yields substantial performance gains on novel targets, underscoring the value of large-scale, diverse datasets for space perception research. The dataset, code, and toolkit are publicly available at https://github.com/wuaodi/SpaceSense-Bench.

研究の動機と目的

自律宇宙作業における堅牢な認識と姿勢推定のための、多様で多モーダルかつ高密度注釈された宇宙機データセットの不足を解消する。
多数の衛星ジオメトリにわたる写真実写に近い、時間同期センサデータを生成するスケーラブルなシミュレーションベースのパイプラインを提供する。
見知らぬターゲットへのゼロショット一般化を含む、複数の認識タスクでの評価を可能にする。
データセット規模がクロスターゲット一般化に与える影響を定量化し、小型部品認識の持続的ボトルネックを特定する。

提案手法

136 台の衛星モデルを七クラス部品分類系で大規模な3D資産ライブラリを作成する。
Unreal Engine 5 で高忠実度の宇宙シーンを構築し、AirSim と統合してRGB、深度、LiDAR sensing を同期させる。
軌道進入と軌道を含む軌道計画と自動グラウンドトゥルース抽出（RGB、深度、LiDAR、7クラスマスク、6-DoF 姿勢）を用いてデータ生成を自動化する。
出力を主流フォーマット（YOLO、MMSegmentation、SemanticKITTI）に変換し、検出、セグメンテーション、3D認識タスクですぐに利用できるようにする。
多様なベースラインとゼロショットプロトコルを用いて5タスクの系統的ベンチマークを実施する。

実験結果

リサーチクエスチョン

RQ1現在の認識手法はゼロショット設定で未見の宇宙機ジオメトリへどの程度一般化できるか。
RQ2訓練多様性（より多くの衛星ジオメトリ）を増やすと未知ターゲットへのゼロショット一般化にどのような影響を与えるか。
RQ3RGB、深度、LiDAR のモダリティは宇宙空間様類条件のマルチモーダル認識にどのように寄与するか。
RQ4宇宙機の小型部品（例：推進器、全指向アンテナ）の認識における持続的なボトルネックは何か。

主な発見

小型部品（全向アンテナや推進器など）は強力なモデルでも IoU が 35% 未満となり、小-object 認識の核心的課題を浮き彫りにする。
クラス別ピクセル分布に長い尾が顕著で、特定の部品（solar_panel、main_body）が優勢であり、小さな部品は依然として困難。
深度と姿勢の基盤を用いたゼロショットの結果は画素ごと/距離の性能は高いが、ターゲット間の深度および姿勢一般化は限定的。
訓練衛星の数を増やすとゼロショットの mIoU が substantial に改善（相対的改善で最大 73%）、mAcc も最大 63% まで改善、非飽和的リターン。
PMFNet（RGB+LiDAR）は3D点群セグメンテーションで 42.4% の mIoU を達成し、マルチモーダル融合の有効性を示す。
Depth Anything V2 はゼロショット深度で AbsRel が約 0.022–0.023、ただし Spearman 相関は控えめ（≈0.55–0.60）であり、この設定では相対深度順序付けに限界があることを示す。
Orientation Estimation with Orient Anything は Mean Axis Angular Error が約 12.75° に達し、フレームの過半が 20° 未満だが、ジオメトリごとに大きな分散がある。
データセット規模の研究は、より大規模で多様なライブラリがゼロショット一般化を改善することを確認し、さらなるスケーリングが引き続き利益を生む可能性を示唆している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。