[论文解读] Stabilized Fine-Tuning with LoRA in Federated Learning: Mitigating the Side Effect of Client Size and Rank via the Scaling Factor
SFed-LoRA 提出一个理论上最优的联邦缩放因子,用于高秩 LoRA,在不同模型与数据分布中增强稳定性与收敛性。
Large Language Models (LLMs) are pivotal in natural language processing. The impracticality of full fine-tuning has prompted Parameter-Efficient Fine-Tuning (PEFT) methods like Low-Rank Adaptation (LoRA), optimizing low-rank matrices A and B. In distributed scenarios where privacy constraints necessitate Federated Learning (FL), however, the integration of LoRA is often unstable. Specifically, we identify that aggregating updates from multiple clients introduces statistical variance that scales with the client count, causing gradient collapse when using high-rank adapters. Existing scaling factor candidates, such as the one used by Rank-Stabilized LoRA, ignore the interaction caused by the aggregation process. To bridge this gap, this paper introduces Stabilized Federated LoRA (SFed-LoRA), a framework that theoretically characterizes the interaction between adapter rank and federated aggregation. We derive an optimal scaling factor designed to effectively mitigate the aggregation error accumulating across N clients. By correcting the scaling mismatch inherent in previous approaches, SFed-LoRA restores the efficacy of high-rank adaptation without altering the original model architecture or increasing inference latency. Extensive experiments in diverse tasks, model architectures, and heterogeneous data distributions are conducted to validate our results. We demonstrate that SFed-LoRA prevents high-rank collapse, and achieves significantly improved stability and faster convergence compared with state-of-the-art baselines for high-rank adaptation.
研究动机与目标
- Motivate and address instability of LoRA in federated learning due to aggregation variance.
- Derive a principled scaling factor that accounts for client count and adapter rank.
- Propose SFed-LoRA to stabilize high-rank adaptation without changing model architecture or inference latency.
- Provide theoretical grounding via infinite-width analysis and validate empirically across tasks and models.
提出的方法
- Define and analyze the interaction between LoRA adapters and federated aggregation.
- Derive the federated-optimal scaling factor gamma_z = alpha * sqrt(N / r) using an infinite-width trajectory framework.
- Adopt FedSA-LoRA as the base framework where only A is aggregated while B stays local.
- Integrate gamma_z into local computation to stabilize training across ranks and client sizes.
- Experimentally compare SFed-LoRA with FedSA-LoRA, FedSA-rsLoRA, and RoLoRA across datasets and models.
实验结果
研究问题
- RQ1How does federated aggregation affect high-rank LoRA adapters in FL settings?
- RQ2What scaling factor gamma optimally stabilizes LoRA under varying N and r?
- RQ3Can high-rank adaptation be stabilized without increasing inference latency?
- RQ4Do the stability gains of SFed-LoRA generalize across tasks, models, optimizers, and data distributions?
主要发现
- SFed-LoRA prevents high-rank collapse and achieves faster convergence than baselines.
- The proposed gamma_z = alpha * sqrt(N / r) offsets aggregation variance and maintains stable gradient norms across ranks.
- SFed-LoRA outperforms standard LoRA, rsLoRA, and RoLoRA on stability and convergence in IID and heterogeneous settings.
- Experiments show robust performance across LLaMA 2 and RoBERTa-large, GSM8K and GLUE tasks, and varying N and r.
- Ablations confirm the optimality of the derived scaling law and the necessity of aggregating only A in FedSA-LoRA.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。