[论文解读] Energy-aware Goal Selection and Path Planning of UAV Systems via Reinforcement Learning
本文提出了一种基于强化学习的无人机能量感知目标选择与路径规划方法,可在强风干扰下动态平衡目标检测准确率与能效。通过建立阻力引起的能耗模型并将其整合到奖励函数中,该智能体在性能上优于完整覆盖算法,在强风条件下检测目标数量最多提升4倍,同时最小化路径长度。
Visual exploration and smart data collection via autonomous vehicles is an attractive topic in various disciplines. Disturbances like wind significantly influence both the power consumption of the flying robots and the performance of the camera. We propose a reinforcement learning approach which combines the effects of the power consumption and the object detection modules to develop a policy for object detection in large areas with limited battery life. The learning model enables dynamic learning of the negative rewards of each action based on the drag forces that is resulted by the motion of the flying robot with respect to the wind field. The algorithm is implemented in a near-real world simulation environment both for the planar motion and flight in different altitudes. The trained agent often performed a trade-off between detecting the objects with high accuracy and increasing the area coverage within its battery life. The developed exploration policy outperformed the complete coverage algorithm by minimizing the traveled path while finding the target objects. The performance of the algorithms under various wind fields was evaluated in planar and 3D motion. During an exploration task with sparsely distributed goals and within a UAV's battery life, the proposed architecture could detect more than twice the amount of goal objects compared to the coverage path planning algorithm in moderate wind field. In high wind intensities, the energy-aware algorithm could detect 4 times the amount of goal objects when compared to its complete coverage counterpart.
研究动机与目标
- 解决无人机在强风环境中进行视觉探索时电池续航受限的挑战。
- 开发一种同时优化目标检测准确率与能效的路径规划策略。
- 通过强化学习中的实时奖励重塑,实现对风致阻力的动态适应。
- 在目标检测率与路径效率方面超越传统完整覆盖算法。
提出的方法
- 采用深度强化学习框架训练无人机智能体,以平衡能耗与目标检测性能。
- 奖励函数基于无人机相对于风场速度计算出的阻力力,引入负向奖励。
- 算法在接近真实场景的仿真环境中训练,支持平面与三维飞行动力学。
- 智能体学习到一种策略,优先选择高概率目标位置,同时最小化路径长度与能耗。
- 风场条件被模拟为影响运动与能耗的动态环境扰动。
- 系统在探索任务中评估覆盖面积、检测准确率与电池寿命之间的权衡。
实验结果
研究问题
- RQ1在不同风速条件下,无人机如何高效检测稀疏分布的目标,同时最小化能耗?
- RQ2何种强化学习策略可在能效受限环境中动态平衡目标检测准确率与路径效率?
- RQ3风致阻力如何实时影响无人机能耗与检测性能?
- RQ4在高风速条件下,所提出的能量感知策略在目标检测方面相较于完整覆盖算法的优越程度如何?
- RQ5智能体能否在电池限制内学习优先检测高价值目标,同时保持可行的飞行路径?
主要发现
- 在中等风速条件下,所提出的能量感知算法检测到的目标数量超过完整覆盖路径规划算法的两倍以上。
- 在高风速条件下,能量感知方法检测到的目标数量是完整覆盖方法的四倍。
- 智能体在保持高检测准确率的同时,显著减少了行进路径长度。
- 该算法通过根据实时风致阻力动态调整动作,成功平衡了能效与检测性能。
- 强化学习模型在不同风速条件下,均在平面与三维飞行场景中表现出鲁棒性。
- 将基于阻力的能耗惩罚整合到奖励函数中,显著提升了探索策略的效率与自适应能力。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。