QUICK REVIEW

[论文解读] Learning Safe Multi-Agent Control with Decentralized Neural Barrier Certificates

Zengyi Qin, Kaiqing Zhang|arXiv (Cornell University)|Jan 14, 2021

Advanced Control Systems Optimization参考文献 36被引用 48

一句话总结

论文提出一个去中心化的联合学习框架，使用神经控制屏障证书（CBFs）训练多智能体控制策略，以确保在大量智能体中的安全性和可扩展性。

ABSTRACT

We study the multi-agent safe control problem where agents should avoid collisions to static obstacles and collisions with each other while reaching their goals. Our core idea is to learn the multi-agent control policy jointly with learning the control barrier functions as safety certificates. We propose a novel joint-learning framework that can be implemented in a decentralized fashion, with generalization guarantees for certain function classes. Such a decentralized framework can adapt to an arbitrarily large number of agents. Building upon this framework, we further improve the scalability by incorporating neural network architectures that are invariant to the quantity and permutation of neighboring agents. In addition, we propose a new spontaneous policy refinement method to further enforce the certificate condition during testing. We provide extensive experiments to demonstrate that our method significantly outperforms other leading multi-agent control approaches in terms of maintaining safety and completing original tasks. Our approach also shows exceptional generalization capability in that the control policy can be trained with 8 agents in one scenario, while being used on other scenarios with up to 1024 agents in complex multi-agent environments and dynamics.

研究动机与目标

为具有形式化安全保障的大规模多智能体系统提供安全控制的动机。
开发一个去中心化框架，以联合学习控制策略和安全证书（CBFs）。
通过局部观测和不变的网络设计实现对任意数量智能体的可扩展性。
展示对未见环境和智能体数量的泛化能力。
提供提高训练效率和测试安全性的技术。

提出的方法

定义依赖于局部状态和观察的去中心化 CBFs。
通过具有边际 γ 的数据联合学习策略 pi_i 和障碍函数 h_i。
使用一个损失函数来强制执行三个 CBF 条件及一个到达目标的项。
采用对数量置换不变的神经编码器来处理动态的邻居数量。
使用策略一致的数据进行训练，并通过迭代数据收集来对齐训练/测试分布。
引入自发的在线策略改进，在违反 CBF 条件时调整行动。

实验结果

研究问题

RQ1在没有中央控制器的情况下，去中心化的控制屏障函数能否在多智能体环境中保证安全？
RQ2如何设计神经网络以处理可变数量和邻居智能体的排列？
RQ3策略和 CBF 的联合学习是否能泛化到未见情景和更大数量的智能体？
RQ4在线改进在测试阶段是否能超越学习到的证书进一步提升安全性？
RQ5在实现目标导向的行为的同时，实际可行的训练策略有哪些以确保安全？

主要发现

在满足去中心化条件时，采用带去中心化 CBF 的联合学习框架可以获得安全保障。
使用数量置换不变的编码器使方法能够扩展到任意数量的智能体和变化的邻居数量。
在8个智能体上训练的策略在复杂环境中泛化到多达1024个智能体的场景。
该方法在2D和3D任务中的安全性和任务完成方面，优于领先的学习和规划方法。
自发的在线策略改进通过主动执行 CBF 条件在测试阶段进一步提升安全性。
实验表明在超出训练条件的环境和智能体数量上具有强泛化能力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。