QUICK REVIEW

[논문 리뷰] SpaceSense-Bench: A Large-Scale Multi-Modal Benchmark for Spacecraft Perception and Pose Estimation

Aodi Wu, J. X. Zuo|arXiv (Cornell University)|2026. 03. 10.

Space Satellite Systems and Control인용 수 0

한 줄 요약

SpaceSense-Bench는 136개의 위성 모델, 동기화된 RGB/깊이/LiDAR 데이터, 조밀한 7클래스 부품 의미론, 6-DoF 그라운드 트루스 포즈를 갖춘 대규모 다중 모달 우주선 인식 벤치마크를 도입하고, 다섯 가지 인식 작업을 벤치마킹하며 제로샷 일반화 및 데이터 규모 효과를 분석한다.

ABSTRACT

Autonomous space operations such as on-orbit servicing and active debris removal demand robust part-level semantic understanding and precise relative navigation of target spacecraft, yet collecting large-scale real data in orbit remains impractical due to cost and access constraints. Existing synthetic datasets, moreover, suffer from limited target diversity, single-modality sensing, and incomplete ground-truth annotations. We present extbf{SpaceSense-Bench}, a large-scale multi-modal benchmark for spacecraft perception encompassing 136~satellite models with approximately 70~GB of data. Each frame provides time-synchronized 1024$ imes$1024 RGB images, millimeter-precision depth maps, and 256-beam LiDAR point clouds, together with dense 7-class part-level semantic labels at both the pixel and point level as well as accurate 6-DoF pose ground truth. The dataset is generated through a high-fidelity space simulation built in Unreal Engine~5 and a fully automated pipeline covering data acquisition, multi-stage quality control, and conversion to mainstream formats. We benchmark five representative tasks (object detection, 2D semantic segmentation, RGB--LiDAR fusion-based 3D point cloud segmentation, monocular depth estimation, and orientation estimation) and identify two key findings: (i)~perceiving small-scale components (\emph{e.g.}, thrusters and omni-antennas) and generalizing to entirely unseen spacecraft in a zero-shot setting remain critical bottlenecks for current methods, and (ii)~scaling up the number of training satellites yields substantial performance gains on novel targets, underscoring the value of large-scale, diverse datasets for space perception research. The dataset, code, and toolkit are publicly available at https://github.com/wuaodi/SpaceSense-Bench.

연구 동기 및 목표

자율 우주 작전에서 강인한 인식과 자세 추정을 위한 다채롭고 다중 모달, 밀집 주석이 있는 우주선 데이터세트의 부족 문제를 해결한다.
다양한 위성 기하학에 걸쳐 photorealistic하고 시간 동기화된 센서 데이터를 생성하는 확장 가능한 시뮬레이션 기반 파이프라인을 제공한다.
제로샷 일반화가 가능한 다중 인식 작업에 걸쳐 평가를 가능하게 한다.
데이터세트 규모가 다중 대상 일반화에 미치는 영향을 정량화하고 소형 부품 인식의 지속적인 병목 현상을 식별한다.

제안 방법

136개의 위성 모델로 구성된 대형 3D 자산 라이브러리와 7-클래스 부품 분류 체계를 구축한다.
Unreal Engine 5에서 고충실도 우주 공간 씬을 구축하고 이를 AirSim과 통합하여 동기화된 RGB, 깊이, LiDAR 센싱을 구현한다.
궤도 접근 및 궤도 진입 경로 계획 등 궤적 계획과 자동 그라운드 트루스 추출(RGB, 깊이, LiDAR, 7-클래스 마스크, 6-DoF 포즈)을 자동화하여 데이터 생성을 자동화한다.
출력을 주류 형식(YOLO, MMSegmentation, SemanticKITTI)으로 변환하여 탐지, 분할 및 3D 인식 작업에 즉시 사용하도록 한다.
다양한 베이스라인과 제로샷 프로토콜을 사용하여 다섯 가지 작업에 대해 체계적 벤치마크를 수행한다.

실험 결과

연구 질문

RQ1현재의 인식 방법들이 제로샷 설정에서 보지 못한 우주선 기하학에 얼마나 잘 일반화되는가?
RQ2더 많은 위성 기하학으로 학습 다변성을 늘리는 것이 새로운 대상에 대한 제로샷 일반화에 어떤 영향을 미치는가?
RQ3RGB, 깊이, LiDAR 모달리티가 우주 공간 유사 조건에서 다중 모달 인식에 어떻게 기여하는가?
RQ4작은 부품(예: 추진체, 범용 안테나) 인식에서 지속적인 병목 현상은 작업 간에 무엇인가?

주요 결과

소형 구성요소(예: 범용 안테나 및 추진체)는 강력한 모델에서도 IoU가 35% 미만으로 나타나 핵심 소형 객체 인식 도전을 강조한다.
클래스별 픽셀 분포에 뚜렷한 롱테일 현상이 있어 특정 부품(solar_panel, main_body)이 지배적이고 작은 부품은 여전히 어려움을 보인다.
깊이 및 방향성 기초를 가진 제로샷 결과는 픽셀당/거리 성능은 강하지만 대상 간 깊이 및 포즈 일반화는 제한적이다.
학습 위성 수를 늘리면 제로샷 mIoU가 크게 개선되며(최대 상대 개선 73%), mAcc는 최대 63%까지 증가하나 수익은 포화되지 않는다.
PMFNet(RGB+LiDAR)은 3D 포인트 클라우드 분할에서 42.4% mIoU를 달성하여 다중 모달 융합의 효과를 시사한다.
Depth Anything V2는 제로샷 깊이에서 AbsRel이 약 0.022–0.023로 나타나지만 Spearman 상관은 다소 보통 수준(≈0.55–0.60)으로 남아 이 설정에서 상대적 깊이 서열화의 한계를 시사한다.
Orient Anything를 사용한 자세 추정은 Mean Axis Angular Error가 약 12.75°로 나타나 다수 프레임이 20° 미만이지만 기하학적으로는 상당한 분산이 존재한다.
데이터세트 규모 연구는 더 크고 다양해진 라이브러리가 제로샷 일반화를 향상시키고 더 큰 규모화를 통해 추가 이득이 가능하다는 점을 확인시켜 준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.