QUICK REVIEW

[論文レビュー] Federated Learning with Non-IID Data

Yue Zhao, Meng Li|arXiv (Cornell University)|Jun 2, 2018

Privacy-Preserving Technologies in Data参考文献 24被引用数 1,901

ひとこと要約

この論文は、クライアント間のデータが非 IID であると FedAvg の性能が低下する理由を、Earth Mover's Distance (EMD) で測る重みの発散と関連付け、データを共有する小さなグローバルデータセットを用いたデータ共有戦略を提案し、精度を回復する方法を示す。

ABSTRACT

Federated learning enables resource-constrained edge compute devices, such as mobile phones and IoT devices, to learn a shared model for prediction, while keeping the training data local. This decentralized approach to train models provides privacy, security, regulatory and economic benefits. In this work, we focus on the statistical challenge of federated learning when local data is non-IID. We first show that the accuracy of federated learning reduces significantly, by up to 55% for neural networks trained for highly skewed non-IID data, where each client device trains only on a single class of data. We further show that this accuracy reduction can be explained by the weight divergence, which can be quantified by the earth mover's distance (EMD) between the distribution over classes on each device and the population distribution. As a solution, we propose a strategy to improve training on non-IID data by creating a small subset of data which is globally shared between all the edge devices. Experiments show that accuracy can be increased by 30% for the CIFAR-10 dataset with only 5% globally shared data.

研究の動機と目的

Quantify how non-IID data across clients reduces FedAvg accuracy compared to IID settings.
Explain weight divergence in FedAvg and bound it using earth mover’s distance (EMD) between client and population distributions.
Propose a data-sharing strategy with a small globally shared dataset to mitigate non-IID effects and evaluate its impact on accuracy.

提案手法

Use CNNs on MNIST, CIFAR-10, and a keyword spotting (KWS) dataset with FedAvg under IID, 2-class non-IID, and 1-class non-IID partitions.
Define weight divergence as the relative distance between FedAvg and centralized SGD weights.
Prove a bound on weight divergence that involves EMD between client distributions and the population distribution.
Empirically correlate weight divergence with EMD and test accuracy across datasets and non-IID settings.
Propose and evaluate a data-sharing strategy where a globally shared dataset (uniform across classes) is used at initialization and optionally a warm-up model is trained on G before distributed training.
Demonstrate accuracy improvements (up to ~30%) on CIFAR-10 with 5% globally shared data.

実験結果

リサーチクエスチョン

RQ1How does non-IID data distribution across clients affect FedAvg accuracy relative to IID data?
RQ2Can weight divergence between FedAvg and centralized SGD be bounded by a function of EMD between client distributions and the population distribution?
RQ3Does introducing a small globally shared dataset mitigate non-IID induced accuracy loss, and by how much?

主な発見

FedAvg accuracy can drop significantly under highly skewed non-IID data (up to 55% for some setups).
Weight divergence between FedAvg and centralized SGD grows with data skew; it can be bounded by a term involving EMD between client and population distributions.
EMD increases weight divergence and reduces test accuracy; larger non-IID skew (higher EMD) correlates with larger accuracy loss, with CIFAR-10 showing substantial drops.
A small globally shared dataset containing a uniform class distribution can substantially recover accuracy, e.g., up to ~30% improvement on CIFAR-10 with 5% shared data.
Data-sharing requires balancing between the amount of globally shared data (beta) and the fraction of that data distributed to clients (alpha); even partial sharing yields meaningful gains.
The data-sharing warm-up model on G enables higher starting performance and reduces the required central data volume to achieve gains.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。