QUICK REVIEW

[论文解读] Improved Communication Efficiency in Federated Natural Policy Gradient via ADMM-based Gradient Updates

Guangchen Lan, Han Wang|arXiv (Cornell University)|Oct 9, 2023

Reinforcement Learning in Robotics被引用 10

一句话总结

FedNPG-ADMM 将联邦自然策略梯度的通信成本从每次迭代 O(d^2) 降到 O(d)，通过使用 ADMM 估计全局 NPG 方向，同时保持与标准 FedNPG 相同的收敛速率。

ABSTRACT

Federated reinforcement learning (FedRL) enables agents to collaboratively train a global policy without sharing their individual data. However, high communication overhead remains a critical bottleneck, particularly for natural policy gradient (NPG) methods, which are second-order. To address this issue, we propose the FedNPG-ADMM framework, which leverages the alternating direction method of multipliers (ADMM) to approximate global NPG directions efficiently. We theoretically demonstrate that using ADMM-based gradient updates reduces communication complexity from ${O}({d^{2}})$ to ${O}({d})$ at each iteration, where $d$ is the number of model parameters. Furthermore, we show that achieving an $ε$-error stationary convergence requires ${O}(\frac{1}{(1-γ)^{2}ε})$ iterations for discount factor $γ$, demonstrating that FedNPG-ADMM maintains the same convergence rate as the standard FedNPG. Through evaluation of the proposed algorithms in MuJoCo environments, we demonstrate that FedNPG-ADMM maintains the reward performance of standard FedNPG, and that its convergence rate improves when the number of federated agents increases.

研究动机与目标

在有限的通信和隐私约束下，推动用于策略优化的联邦强化学习。
开发一种二阶的 FedRL 方法，在不牺牲性能的情况下降低通信开销。
提出一种基于 ADMM 的方案，在分布式环境中估计全局 NPG 方向。
提供收敛性保证，显示与标准 FedNPG 相同的收敛速度。
在 MuJoCo 环境中展示经验性能和通信收益。

提出的方法

将全局方向计算表述为一个二次规划，其解等于 (sum_i H_i)^{-1} sum_i g_i。
将问题转化为分布式的 ADMM 框架，仅交换低维变量 (y_i) 和梯度 g_i。
推导 FedNPG-ADMM 的算法 1，其中每个代理使用 (H_i + ρ I)^{-1} 更新 y_i，服务器对 y_i 进行平均得到 y 并更新 theta。
证明每次迭代的上行通信复杂度为 O(d)，而非标准 FedNPG 的 O(d^2)。
证明收敛性：FedNPG-ADMM 以 O(1/((1-γ)^2 ε)) 次迭代达到 ε-驻近收敛，与 FedNPG 相匹配。
结合 MuJoCo 任务的仿真，比较 FedNPG-ADMM、FedNPG 和 FedPPO 的理论结果与实际表现。

Figure 1: An illustration of federated learning based on second-order methods with $N$ agents. (a) FedNPG via standard average. In the uplink, transmitting the matrix $\mathbf{H}_{i}$ brings $\mathcal{O}(d^{2})$ communication complexity. (b) FedNPG-ADMM in this paper with only $\mathcal{O}(d)$ commu

实验结果

研究问题

RQ1ADMM 基于梯度更新是否能在不影响收敛性的前提下降低联邦二阶策略梯度方法的通信复杂度？
RQ2FedNPG-ADMM 是否维持标准 FedNPG 的 ε-驻近收敛速率和样本复杂度？
RQ3在联邦代理数量增加的情况下，FedNPG-ADMM 的收敛与奖励如何扩展？
RQ4减小通信对奖励表现和对部分代理参与鲁棒性的实际影响是什么？

主要发现

基于 ADMM 的更新将上行通信从 O(d^2) 降至 O(d) 每次迭代。
FedNPG-ADMM 以与标准 FedNPG 相同的 ε-驻近收敛速率收敛，迭代复杂度 K = O(1/((1-γ)^2 ε))。
MuJoCo 实验结果表明 FedNPG-ADMM 在各任务中的奖励表现与 FedNPG 相当。
增加联邦代理数量可提升 FedNPG-ADMM 的收敛速度，与理论优势相符。
FedNPG-ADMM 在 Swimmer-v4 和 Humanoid-v4 任务中提供显著的通信节省（可达数个数量级），同时保持性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。