QUICK REVIEW

[論文レビュー] Optimizing Mission Planning for Multi-Debris Rendezvous Using Reinforcement Learning with Refueling and Adaptive Collision Avoidance

Agni Bandyopadhyay, Günther Waxenegger-Wilfing|arXiv (Cornell University)|Feb 4, 2026

Spacecraft Dynamics and Control被引用数 0

ひとこと要約

この論文は、遮蔽された PPO に基づく強化学習フレームワークを用いた自律的な多デブリ rendezvous ミッションを提案し、給油と適応的衝突回避を統合して燃料使用とミッション効率を最適化する。

ABSTRACT

As the orbital environment around Earth becomes increasingly crowded with debris, active debris removal (ADR) missions face significant challenges in ensuring safe operations while minimizing the risk of in-orbit collisions. This study presents a reinforcement learning (RL) based framework to enhance adaptive collision avoidance in ADR missions, specifically for multi-debris removal using small satellites. Small satellites are increasingly adopted due to their flexibility, cost effectiveness, and maneuverability, making them well suited for dynamic missions such as ADR. Building on existing work in multi-debris rendezvous, the framework integrates refueling strategies, efficient mission planning, and adaptive collision avoidance to optimize spacecraft rendezvous operations. The proposed approach employs a masked Proximal Policy Optimization (PPO) algorithm, enabling the RL agent to dynamically adjust maneuvers in response to real-time orbital conditions. Key considerations include fuel efficiency, avoidance of active collision zones, and optimization of dynamic orbital parameters. The RL agent learns to determine efficient sequences for rendezvousing with multiple debris targets, optimizing fuel usage and mission time while incorporating necessary refueling stops. Simulated ADR scenarios derived from the Iridium 33 debris dataset are used for evaluation, covering diverse orbital configurations and debris distributions to demonstrate robustness and adaptability. Results show that the proposed RL framework reduces collision risk while improving mission efficiency compared to traditional heuristic approaches. This work provides a scalable solution for planning complex multi-debris ADR missions and is applicable to other multi-target rendezvous problems in autonomous space mission planning.

研究の動機と目的

混雑した LEO と衝突リスクのため ADR を重要な問題として動機付ける。
燃料と安全を管理しつつ debris 訪問をシーケンス化する自律的計画フレームワークを開発する。
RL 方針に適応的衝突ゾーンと給油決定を組み込む。
多様な debris シナリオに対してヒューリスティックおよびハイブリッドのベースラインと比較して性能を評価する。

提案手法

ADR を宇宙軌道、燃料、訪問マスク、衝突リスクを表現する状態を持つマルコフ決定過程として定式化する。
離散的で遮蔽された PPO 方策を用いて debris rendezvous、給油、衝突回避動作から選択する。
5x5x5 km の直方体の危険ゾーンと楕円形の迂回 CA Above/CA Below 運動を含む確率的な 33% 衝突ゾーンを導入する。
各状態で実行可能な動作に方策を制限する無効動作マスキングを適用する。
合計 1,000 万ステップで乱択された debris シナリオを学習し、100 のテストケースでベースラインと比較して評価する。

実験結果

リサーチクエスチョン

RQ1マスク付き PPO ベースの RL エージェントは、動的な衝突リスクと燃料制約の下で堅牢な debris 訪問シーケンスを学習できるか。
RQ2給油の統合がミッション期間、 debris のカバー、安全性にどのように影響するか、ヒューリスティック法と比較してどうか。
RQ3適応的衝突回避が、 varied debris 構成に対してミッションの効率と安全性に及ぼす影響は何か。

主な発見

Evaluation Type	Average	Max	Min
RL all	30.4	31	29
RL + Greedy CA	29.5	31	28
Greedy + RL CA	21.6	23	21
Greedy + Greedy	19.3	23	17

RL ベースのフレームワークは、従来のヒューリスティックよりも衝突リスクを低減し、ミッション効率を改善する。
RL-RL モード（シーケンスと衝突回避の両方を方策が処理）は、最も高い debris カバレッジを達成する。
ハイブリッドモード（RL と貪欲 or RL がサブタスクのみに適用）は、完全な RL ベースの計画と比較して劣る。
評価では、100 件の乱択ケース全体で、RL-RL がハイブリッド構成より平均でより多くの debris 訪問を行う。
CA Above/CA Below による衝突回避は必要なクリアランスを維持しつつ、ミッション進行を可能にする。
学習は約 8 百万ステップ後に収束し、報酬と挙動が安定する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。