[Paper Review] Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification
The paper synthesizes non-identical client data distributions using a Dirichlet model to study FedAvg performance, showing degradation as distributions diverge and proposing server momentum (FedAvgM) to mitigate this gap.
Federated Learning enables visual models to be trained in a privacy-preserving way using real-world data from mobile devices. Given their distributed nature, the statistics of the data across these devices is likely to differ significantly. In this work, we look at the effect such non-identical data distributions has on visual classification via Federated Learning. We propose a way to synthesize datasets with a continuous range of identicalness and provide performance measures for the Federated Averaging algorithm. We show that performance degrades as distributions differ more, and propose a mitigation strategy via server momentum. Experiments on CIFAR-10 demonstrate improved classification performance over a range of non-identicalness, with classification accuracy improved from 30.1% to 76.9% in the most skewed settings.
Motivation & Objective
- Motivate and quantify how non-identical data distributions across clients affect federated visual classification.
- Develop a synthetic data generation method to span a continuous range of distribution identicalness using Dirichlet priors.
- Benchmark FedAvg under varying non-identicalness and hyperparameters on CIFAR-10.
- Propose and evaluate a mitigation via server-side momentum (FedAvgM) to improve convergence and accuracy.
Proposed method
- Define client data distributions as categorical with N classes drawn from Dirichlet(alpha * p).
- Vary alpha to create a continuous spectrum from identical to highly non-identical client data.
- Use CIFAR-10 with 100 clients, 500 images per client, and a CNN similar to McMahan et al., with fixed weight decay and no LR decay.
- Run FedAvg across rounds with B=64, E in {1,5}, and C in {0.05,0.1,0.2,0.4}, over 10,000 rounds, and hyperparameter search over eta.
- Explore server-side momentum: v <- beta v + Delta w; w <- w - v, i.e., FedAvgM with Nesterov momentum.
Experimental results
Research questions
- RQ1How does non-identical data distributions across clients impact FedAvg performance on visual classification tasks?
- RQ2Can a Dirichlet-based synthesis of client data distributions capture a continuous range of identicalness and reveal hyperparameter sensitivities?
- RQ3Does incorporating server momentum (FedAvgM) mitigate performance degradation caused by data non-identicalness?
- RQ4What are the hyperparameter sensitivities (learning rate, momentum, local epochs, participation fraction) under varying data skew?
- RQ5How close can FedAvgM approach centralized learning performance under non-IID conditions?
Key findings
- Classification accuracy degrades as Dirichlet concentration alpha decreases (more non-identical data).
- Increasing the reporting fraction C yields diminishing returns, especially under non-identical data, and synchronized updates (E) have mixed effects.
- FedAvgM consistently improves test accuracy over FedAvg on non-identical data, often approaching centralized performance (e.g., near 86.0% in many cases).
- Optimal effective server learning rate eta_eff = eta / (1 - beta) varies with C and E, with a wider viable window for larger C and tighter for small C.
- Higher local epochs E increase update variance, necessitating lower eta_eff to maintain stability; FedAvgM helps stabilize under these conditions.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.