QUICK REVIEW

[論文レビュー] Federated Deep Reinforcement Learning

Hankz Hankui Zhuo, Wenfeng Feng|arXiv (Cornell University)|Jan 24, 2019

Privacy-Preserving Technologies in Data参考文献 26被引用数 86

ひとこと要約

FedRLは、ノイズを含む出力を共有する共通のMLPを介して、2つのプライバシー保護エージェントが連合学習により高品質なQネットワークを学習できるようにし、データとモデルのプライバシーを保護しつつ性能を向上させます。

ABSTRACT

In deep reinforcement learning, building policies of high-quality is challenging when the feature space of states is small and the training data is limited. Despite the success of previous transfer learning approaches in deep reinforcement learning, directly transferring data or models from an agent to another agent is often not allowed due to the privacy of data and/or models in many privacy-aware applications. In this paper, we propose a novel deep reinforcement learning framework to federatively build models of high-quality for agents with consideration of their privacies, namely Federated deep Reinforcement Learning (FedRL). To protect the privacy of data and models, we exploit Gausian differentials on the information shared with each other when updating their local models. In the experiment, we evaluate our FedRL framework in two diverse domains, Grid-world and Text2Action domains, by comparing to various baselines.

研究の動機と目的

状態特徴空間が異なりデータがプライベートな場合に、高品質なポリシーの学習を動機付ける。
Gaussian differential privacyを用いてデータ/モデルを保護する連合RLフレームワーク（FedRL）を提案する。
共通のグローバル価値ネットワークを用いて、2つのエージェントがプライベートなQネットワークを協調して訓練できるようにする。
Grid-WorldとText2ActionドメインでFedRLを実証し、ベースラインと比較する。

提案手法

各エージェントは、パラメータtheta_alphaまたはtheta_betaを持つ局所Qネットワークと、 theta_g を持つ共有グローバルMLPを維持する。
局所Qネットワークの出力はGaussianノイズで摂動され、hat{Q}_alphaとhat{Q}_betaを形成する。
連合QネットワークQ_fは、連結されたノイズ付き出力の上に構成されたMLPとして構築される: Q_f = MLP([hat{Q}_alpha; hat{Q}_beta]; theta_g)。
各エージェントは、相手エージェントのノイズ付きQネットワークを固定入力として扱い、自分のQネットワークと共有MLPを更新する。
訓練はY^jに基づく二乗損失L^j_alphaとL^j_betaを最小化する。Y^j = r^j + gamma * max_a Q_f^alpha(s_alpha^j,a,C_beta; theta_alpha, theta_g) for alpha、betaには報酬を欠く形で同様に定義する。
プライバシーは勾配ではなくQネットワークの出力にGaussianノイズを加えることで達成され、差分プライバシーの原理を用いる。

実験結果

リサーチクエスチョン

RQ1異なる状態空間を持ち、データがプライベートなエージェントに対して、連合学習はポリシー品質を改善できるか？
RQ2FedRLは、プライバシー制約なしで両エージェントのデータを中心で結合した場合の性能にどれだけ近づけるか？
RQ3FedRLにおける学習性能へのガウス差分プライバシーの影響は何か？

主な発見

Method	8x8 SuccRate	16x16 SuccRate	32x32 SuccRate	8x8 AvgRwd	16x16 AvgRwd	32x32 AvgRwd
FCN-alpha	69.73%	48.04%	41.73%	-	-	-
DQN-alpha	88.27%	76.20%	71.41%	-112.084	-112.084	-285.946
FedRL-1	92.52%	79.83%	77.88%	-94.193	-94.193	-226.583
FedRL-2	95.06%	84.31%	82.02%	-84.139	-84.139	-189.756
FCN-full	72.16%	56.44%	50.15%	-38.114	-38.114	-52.72
DQN-full	93.69%	83.40%	79.73%	-38.114	-38.114	-52.72

FedRL-2 (with Gaussian privacy) consistently outperforms FedRL-1 and baseline DQN-alpha in Grid-World SuccRate across 8x8, 16x16, and 32x32 domains.
FedRL-2 achieves SuccRate close to or approaching DQN-full (centralized data) across Grid-World sizes, indicating effective privacy-preserving collaboration.
In Grid-World, FedRL-2 yields higher AvgRwd than DQN-alpha and FedRL-1, and approaches DQN-full performance as domain size increases.
In Text2Action, FedRL-2 outperforms FCN-alpha and DQN-alpha in F1 and AvgRwd across WHS, WHG, and CT datasets, and is competitive with DQN-full.
FedRL demonstrates that federated learning with privacy-preserving sharing can yield high-quality policies without directly sharing data or models.
History length impacts FedRL performance: longer histories improve success rates, with FedRL-2 showing robust performance even with limited history.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。