QUICK REVIEW

[论文解读] Air Learning: An AI Research Platform for Algorithm-Hardware Benchmarking of Autonomous Aerial Robots

Srivatsan Krishnan, Behzad Boroujerdian|arXiv (Cornell University)|Jun 2, 2019

Reinforcement Learning in Robotics参考文献 51被引用 31

一句话总结

Air Learning 是一个开源模拟器和强化学习平台，用于在资源受限的无人机（UAV）上基准测试深度强化学习算法。通过在训练过程中集成软硬件协同延迟建模，它将高端系统与嵌入式系统之间的飞行时间差异从 37.73% 降低至 0.5%，从而实现在低功耗平台（如 Raspberry Pi）上的精确策略部署。

ABSTRACT

We introduce Air Learning, an open-source simulator, and a gym environment for deep reinforcement learning research on resource-constrained aerial robots. Equipped with domain randomization, Air Learning exposes a UAV agent to a diverse set of challenging scenarios. We seed the toolset with point-to-point obstacle avoidance tasks in three different environments and Deep Q Networks (DQN) and Proximal Policy Optimization (PPO) trainers. Air Learning assesses the policies' performance under various quality-of-flight (QoF) metrics, such as the energy consumed, endurance, and the average trajectory length, on resource-constrained embedded platforms like a Raspberry Pi. We find that the trajectories on an embedded Ras-Pi are vastly different from those predicted on a high-end desktop system, resulting in up to 40% longer trajectories in one of the environments. To understand the source of such discrepancies, we use Air Learning to artificially degrade high-end desktop performance to mimic what happens on a low-end embedded system. We then propose a mitigation technique that uses the hardware-in-the-loop to determine the latency distribution of running the policy on the target platform (onboard compute on the aerial robot). A randomly sampled latency from the latency distribution is then added as an artificial delay within the training loop. Training the policy with artificial delays allows us to minimize the hardware gap (discrepancy in the flight time metric reduced from 37.73% to 0.5%). Thus, Air Learning with hardware-in-the-loop characterizes those differences and exposes how the onboard compute's choice affects the aerial robot's performance. We also conduct reliability studies to assess the effect of sensor failures on the learned policies. All put together, Air Learning enables a broad class of deep RL research on UAVs. The source code is available at:this http URL.

研究动机与目标

解决在高端桌面系统上训练与在嵌入式无人机平台上的实际部署之间的性能差距。
实现在资源受限的空中机器人上对深度强化学习策略的真实基准测试。
量化并缓解飞行性能指标（如轨迹长度和能耗）中的差异。
研究传感器故障和硬件限制对学习策略的影响。
为自主空中机器人中的算法-硬件协同设计提供可复现的开源平台。

提出的方法

该平台使用领域随机化，使无人机智能体在训练期间暴露于多样化且具有挑战性的环境中。
它集成了一个与 Gym 兼容的环境，用于在点对点避障任务上训练 DQN 和 PPO 智能体。
采用软硬件协同方法，捕捉在目标嵌入式平台（如 Raspberry Pi）上运行策略的延迟分布。
从测量的延迟分布中采样人工延迟，并注入训练循环中，以模拟实际机载计算约束。
通过飞行质量指标（包括能耗、续航时间和平均轨迹长度）评估性能。
可靠性研究评估了在模拟传感器故障下策略的鲁棒性，从而增强实际部署的可靠性。

实验结果

研究问题

RQ1在高端系统上训练的深度强化学习策略在部署到低功耗嵌入式平台（如 Raspberry Pi）时，其性能如何变化？
RQ2在训练期间通过人工延迟注入，能在多大程度上缩小模拟与真实无人机部署之间的性能差距？
RQ3计算延迟等硬件约束如何影响轨迹长度和能耗等关键飞行质量指标？
RQ4传感器故障如何影响自主无人机导航中学习策略的鲁棒性？
RQ5领域随机化在提升策略在多样化且具有挑战性的无人机环境中的泛化能力方面发挥什么作用？

主要发现

在嵌入式 Raspberry Pi 平台上生成的轨迹比在高端桌面系统上生成的轨迹最长延长了 40%，表明存在显著的硬件性能差距。
通过使用目标平台延迟分布中提取的人工延迟进行训练，飞行时间指标的硬件差距从 37.73% 降低至 0.5%。
在训练期间使用软硬件协同延迟建模显著提升了策略向嵌入式系统的可迁移性。
采用领域随机化训练的策略在多样化环境变化中表现出更强的鲁棒性。
传感器故障研究显示，学习到的策略在部分传感器退化的情况下仍能维持导航性能，从而提升了实际部署的可靠性。
该平台能够在真实硬件和环境约束下，实现对无人机上强化学习算法的精确、可复现的基准测试。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。