QUICK REVIEW

[论文解读] Online Deep Reinforcement Learning for Autonomous UAV Navigation and Exploration of Outdoor Environments

Bruna G. Maciel-Pearson, Letizia Marchegiani|arXiv (Cornell University)|Dec 11, 2019

Robotics and Sensor-Based Localization参考文献 41被引用 24

一句话总结

本文提出了一种扩展型双深度Q网络（EDDQN），用于在户外环境中实现自主无人机导航与探索，采用由原始RGB图像和局部位置地图组成的双输入状态。该方法在未见过的地形和恶劣天气条件下展现出强大的泛化能力，相较于基线DQN、DDQN和DRQN模型，在步数效率和累积奖励方面表现更优，并成功实现实时部署于模拟无人机，飞行时间限制为30分钟以内。

ABSTRACT

With the rapidly growing expansion in the use of UAVs, the ability to autonomously navigate in varying environments and weather conditions remains a highly desirable but as-of-yet unsolved challenge. In this work, we use Deep Reinforcement Learning to continuously improve the learning and understanding of a UAV agent while exploring a partially observable environment, which simulates the challenges faced in a real-life scenario. Our innovative approach uses a double state-input strategy that combines the acquired knowledge from the raw image and a map containing positional information. This positional data aids the network understanding of where the UAV has been and how far it is from the target position, while the feature map from the current scene highlights cluttered areas that are to be avoided. Our approach is extensively tested using variants of Deep Q-Network adapted to cope with double state input data. Further, we demonstrate that by altering the reward and the Q-value function, the agent is capable of consistently outperforming the adapted Deep Q-Network, Double Deep Q- Network and Deep Recurrent Q-Network. Our results demonstrate that our proposed Extended Double Deep Q-Network (EDDQN) approach is capable of navigating through multiple unseen environments and under severe weather conditions.

研究动机与目标

为解决在未知、动态且恶劣的户外环境中实现自主无人机导航的挑战，特别是在搜救（SAR）任务中的应用。
在无需微调或特定领域数据的情况下，提升在森林、农田和草原等未见领域之间的泛化能力。
通过用轻量级前馈结构替代循环网络，降低计算负载，实现机载部署。
通过结合视觉感知与位置记忆，提升导航效率，实现障碍物规避与最短路径规划。
通过持续在线学习，在无需离线微调的情况下，实现多次飞行任务中性能的持续提升。

提出的方法

EDDQN智能体采用双状态输入：来自无人机摄像头的原始RGB图像（84×84）和编码了位置历史与障碍物位置的100×100局部地图。
网络架构是双Stream Dueling Deep Q-Network（DDQN）的扩展，分别对视觉与地图输入进行处理，随后在共享Q值头处融合。
设计了一种新颖的奖励塑造函数，优先奖励探索（访问未探索区域获得更高奖励），并惩罚冗余步数与碰撞行为。
通过双Q学习机制优化Q值函数，以减少过估计偏差，提升训练过程中的策略稳定性。
通过经验回放与目标网络更新实现在线训练，支持在多个飞行任务与环境中持续适应。
该方法不依赖相机内参或真实标签数据，因此可部署于不同分辨率与载荷的各类无人机平台。

实验结果

研究问题

RQ1深度强化学习智能体是否能在无需微调的情况下，泛化至未见过的户外环境（如森林、农田、草原）？
RQ2将局部地图与原始视觉输入结合，能否提升在部分可观测环境中的导航性能并减少步数？
RQ3在恶劣天气条件下，所提出的EDDQN方法是否在累积奖励与路径效率方面优于标准DQN、DDQN与DRQN模型？
RQ4与DRQN等循环模型相比，双输入架构在多大程度上降低了计算负载，从而实现机载部署？
RQ5当视野中存在移动动物等动态元素时，智能体是否仍能保持高性能？

主要发现

在重度降雪与浓雾条件下的未见森林环境中，EDDQN方法每轮平均仅需7.5步，优于DRQN*100（8.2步）与DQN*（7.35步），步数效率更高。
在有移动动物的草原环境中，EDDQN实现了0%的障碍物碰撞率，平均任务时长稳定在13.34分钟，而DRQN*1000完全失败。
EDDQN在测试V中的平均累积奖励为0.5079，高于DRQN*100的0.2573，表明其更倾向于探索而非重复访问。
该方法将特征输入大小从28,224（84×84×4）降低至7,156（84×84 + 100），显著降低计算负载，支持实时机载推理。
该模型在全部八个测试场景中表现一致，涵盖不同天气条件与未见领域，性能无下降。
EDDQN平均在30分钟内完成全部任务，满足商用无人机的电池续航限制。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。