QUICK REVIEW

[论文解读] SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models

Sara Babakniya, Ahmed Roushdy Elkordy|arXiv (Cornell University)|Aug 12, 2023

Privacy-Preserving Technologies in Data被引用 9

一句话总结

SLoRA 引入一种两阶段的联邦 PEFT 方法，利用稀疏、数据驱动的微调对 LoRA 进行预训练，以在显著降低训练时间和通信量的同时达到与全量微调相当的性能。它在 ~1% 更新密度下实现可比的准确性，且训练时间最多可降低 90%。

ABSTRACT

Transfer learning via fine-tuning pre-trained transformer models has gained significant success in delivering state-of-the-art results across various NLP tasks. In the absence of centralized data, Federated Learning (FL) can benefit from distributed and private data of the FL edge clients for fine-tuning. However, due to the limited communication, computation, and storage capabilities of edge devices and the huge sizes of popular transformer models, efficient fine-tuning is crucial to make federated training feasible. This work explores the opportunities and challenges associated with applying parameter efficient fine-tuning (PEFT) methods in different FL settings for language tasks. Specifically, our investigation reveals that as the data across users becomes more diverse, the gap between fully fine-tuning the model and employing PEFT methods widens. To bridge this performance gap, we propose a method called SLoRA, which overcomes the key limitations of LoRA in high heterogeneous data scenarios through a novel data-driven initialization technique. Our experimental results demonstrate that SLoRA achieves performance comparable to full fine-tuning, with significant sparse updates with approximately $\sim 1\%$ density while reducing training time by up to $90\%$.

研究动机与目标

评估在数据异质性的条件下，参数高效微调（PEFT）方法在联邦学习（FL）中的 NLP 任务上的表现。
识别现有 PEFT 方法在非 IID FL 设置中的局限性并提出改进。
开发一种数据驱动的预初始化技术，以弥合 PEFT 与 FL 中全量微调之间的差距。
提出并评估一个两阶段的 Primed-LoRA 方法，在降低通信和计算成本的同时保持准确性。

提出的方法

在 FL 中评估集中式 PEFT 基线（Pfieffer、LoRA、Houlsby、BitFit），并在不同数据异质性下量化性能。
引入 Primed-LoRA（两阶段）：阶段 1 使用稀疏微调（SFT）来为良好的初始化进行预置；阶段 2 基于阶段 1 更新的 SVD 分解对 LoRA 进行应用。
在阶段 1 中，使用服务器生成的随机掩码实现稀疏微调，以保持更新与数据无关并实现通信高效。
在阶段 2，对密集层应用 LoRA 模块，秩为 r，并将阶段 1 更新的 SVD 分解结果作为 A、B 矩阵的初始化。
在不同数据异质性设置（non-IID 分布）和更新密度下，将 SLoRA 与 FFT、LoRA、SFT 进行对比。
对 Albert 与 DistilBERT 在 News Category 和 20News Group 数据集上的训练时间、通信成本与准确性进行分析。

实验结果

研究问题

RQ1在异质客户端数据分布下，PEFT 方法在 FL 的 NLP 任务中的表现如何？
RQ2数据驱动的预初始化策略是否能够缩小 FL 中 PEFT 与全量微调之间的差距？
RQ3在高度非 IID 的 FL 设置中，SLoRA 是否在保持 FFT 级准确性的同时降低通信和计算成本？
RQ4稀疏更新密度对联邦语言模型微调中的准确性、训练时间和通信的影响如何？

主要发现

随着数据异质性增加，PEFT 的表现相较于 FFT 下降。
在高度非 IID 的 FL 设置下，LoRA 难以达到 FFT 性能，且收敛可能更慢。
SLoRA 在更新密度约为 1% 的情况下达到 FFT 精度，训练时间最多降低 90%。
阶段 1 的服务器生成的掩码稀疏微调为阶段 2 的 LoRA 提供了数据高效的预初始化。
阶段 2 的 LoRA 以阶段 1 更新的 SVD 为初始化，在增加的参数约为原始模型的 ~1.3% 的情况下实现强性能。
SLoRA 在不同随机种子下表现更稳定，达到可比精度所需的通信轮次更少。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。