[论文解读] Optimizing Mission Planning for Multi-Debris Rendezvous Using Reinforcement Learning with Refueling and Adaptive Collision Avoidance
论文提出基于蒙蔽 PPO 的强化学习框架,用于自主多碎片对接任务,整合加油与自适应避撞以优化燃料使用和任务效率。
As the orbital environment around Earth becomes increasingly crowded with debris, active debris removal (ADR) missions face significant challenges in ensuring safe operations while minimizing the risk of in-orbit collisions. This study presents a reinforcement learning (RL) based framework to enhance adaptive collision avoidance in ADR missions, specifically for multi-debris removal using small satellites. Small satellites are increasingly adopted due to their flexibility, cost effectiveness, and maneuverability, making them well suited for dynamic missions such as ADR. Building on existing work in multi-debris rendezvous, the framework integrates refueling strategies, efficient mission planning, and adaptive collision avoidance to optimize spacecraft rendezvous operations. The proposed approach employs a masked Proximal Policy Optimization (PPO) algorithm, enabling the RL agent to dynamically adjust maneuvers in response to real-time orbital conditions. Key considerations include fuel efficiency, avoidance of active collision zones, and optimization of dynamic orbital parameters. The RL agent learns to determine efficient sequences for rendezvousing with multiple debris targets, optimizing fuel usage and mission time while incorporating necessary refueling stops. Simulated ADR scenarios derived from the Iridium 33 debris dataset are used for evaluation, covering diverse orbital configurations and debris distributions to demonstrate robustness and adaptability. Results show that the proposed RL framework reduces collision risk while improving mission efficiency compared to traditional heuristic approaches. This work provides a scalable solution for planning complex multi-debris ADR missions and is applicable to other multi-target rendezvous problems in autonomous space mission planning.
研究动机与目标
- 由于 crowded LEO 的拥挤和碰撞风险,动力 ADR 成为关键问题;
- 开发一个自治规划框架,按顺序安排碎片访问,同时管理燃料与安全;
- 将自适应碰撞区域与加油决策纳入 RL 策略;
- 在多样化碎片场景下,评估相较于启发式与混合基线的性能。
提出的方法
- 将 ADR 表述为一个 Markov Decision Process,状态包含轨道、燃料、访问掩码和碰撞风险;
- 使用离散的、带掩码的 PPO 策略,从碎片对接、加油和避撞等动作中进行选择;
- 引入带有概率的 33% 碰撞区域,包含 5x5x5 km 的立方体危险区以及椭圆形绕道的 CA Above/CA Below 动作;
- 应用无效动作屏蔽,使策略在每个状态下仅限于可行动作;
- 在 1,000 万步的随机碎片场景上训练;在 100 个测试用例中与基线进行对比评估。
实验结果
研究问题
- RQ1带有遮罩的 PPO 基于 RL 的代理能否在动态碰撞风险和燃料约束下学习鲁棒的碎片访问序列?
- RQ2与启发式方法相比,综合加油对任务时长、碎片覆盖率与安全性的影响如何?
- RQ3在 varied debris 配置下,自适应避撞对任务效率与安全性的影响是什么?
主要发现
| Evaluation Type | Average | Max | Min |
|---|---|---|---|
| RL all | 30.4 | 31 | 29 |
| RL + Greedy CA | 29.5 | 31 | 28 |
| Greedy + RL CA | 21.6 | 23 | 21 |
| Greedy + Greedy | 19.3 | 23 | 17 |
- 基于 RL 的框架在降低碰撞风险和提升任务效率方面优于传统启发式方法。
- RL-RL 模式(策略同时处理排序与避撞)实现了最高的碎片覆盖率。
- 混合模式(RL 结合贪心或仅对一个子任务使用 RL)相较于完全基于 RL 的规划表现不佳。
- 在评估中,RL-RL 在 100 个随机化案例中平均访问更多的碎片,优于混合配置。
- 通过 CA Above/CA Below 绕道的避撞在保持所需间隙的同时,确保任务持续推进。
- 训练在约 8 百万步后收敛,奖励和行为稳定。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。