[논문 리뷰] Stabilized Fine-Tuning with LoRA in Federated Learning: Mitigating the Side Effect of Client Size and Rank via the Scaling Factor
SFed-LoRA는 이론적으로 도출된 연합 스케일링 팩터 gamma_z = alpha * sqrt(N / r)를 제시하여 연합 미세조정에서 고랭크 LoRA 어댑터의 안정화를 돕고 수렴을 개선하며 다양한 클라이언트 수와 랭크에서 그래디언트 붕괴를 방지한다.
Large Language Models (LLMs) are pivotal in natural language processing. The impracticality of full fine-tuning has prompted Parameter-Efficient Fine-Tuning (PEFT) methods like Low-Rank Adaptation (LoRA), optimizing low-rank matrices A and B. In distributed scenarios where privacy constraints necessitate Federated Learning (FL), however, the integration of LoRA is often unstable. Specifically, we identify that aggregating updates from multiple clients introduces statistical variance that scales with the client count, causing gradient collapse when using high-rank adapters. Existing scaling factor candidates, such as the one used by Rank-Stabilized LoRA, ignore the interaction caused by the aggregation process. To bridge this gap, this paper introduces Stabilized Federated LoRA (SFed-LoRA), a framework that theoretically characterizes the interaction between adapter rank and federated aggregation. We derive an optimal scaling factor designed to effectively mitigate the aggregation error accumulating across N clients. By correcting the scaling mismatch inherent in previous approaches, SFed-LoRA restores the efficacy of high-rank adaptation without altering the original model architecture or increasing inference latency. Extensive experiments in diverse tasks, model architectures, and heterogeneous data distributions are conducted to validate our results. We demonstrate that SFed-LoRA prevents high-rank collapse, and achieves significantly improved stability and faster convergence compared with state-of-the-art baselines for high-rank adaptation.
연구 동기 및 목표
- Motivate and address instability of LoRA in federated learning due to aggregation variance.
- Derive a principled scaling factor that accounts for client count and adapter rank.
- Propose SFed-LoRA to stabilize high-rank adaptation without changing model architecture or inference latency.
- Provide theoretical grounding via infinite-width analysis and validate empirically across tasks and models.
제안 방법
- Define and analyze the interaction between LoRA adapters and federated aggregation.
- Derive the federated-optimal scaling factor gamma_z = alpha * sqrt(N / r) using an infinite-width trajectory framework.
- Adopt FedSA-LoRA as the base framework where only A is aggregated while B stays local.
- Integrate gamma_z into local computation to stabilize training across ranks and client sizes.
- Experimentally compare SFed-LoRA with FedSA-LoRA, FedSA-rsLoRA, and RoLoRA across datasets and models.
실험 결과
연구 질문
- RQ1How does federated aggregation affect high-rank LoRA adapters in FL settings?
- RQ2What scaling factor gamma optimally stabilizes LoRA under varying N and r?
- RQ3Can high-rank adaptation be stabilized without increasing inference latency?
- RQ4Do the stability gains of SFed-LoRA generalize across tasks, models, optimizers, and data distributions?
주요 결과
- SFed-LoRA prevents high-rank collapse and achieves faster convergence than baselines.
- The proposed gamma_z = alpha * sqrt(N / r) offsets aggregation variance and maintains stable gradient norms across ranks.
- SFed-LoRA outperforms standard LoRA, rsLoRA, and RoLoRA on stability and convergence in IID and heterogeneous settings.
- Experiments show robust performance across LLaMA 2 and RoBERTa-large, GSM8K and GLUE tasks, and varying N and r.
- Ablations confirm the optimality of the derived scaling law and the necessity of aggregating only A in FedSA-LoRA.
더 나은 연구,지금 바로 시작하세요
연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.
카드 등록 없음 · 무료 플랜 제공
이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.