Skip to main content
QUICK REVIEW

[论文解读] SplitLoRA: A Split Parameter-Efficient Fine-Tuning Framework for Large Language Models

Lin Zheng, Xuanjie Hu|arXiv (Cornell University)|Jul 1, 2024
Natural Language Processing Techniques被引用 7
一句话总结

SplitLoRA 将拆分学习和联邦学习与基于 LoRA 的参数高效微调相结合,以在去中心化的私有数据上高效微调大语言模型,达到与更低计算和通信成本的可比精度。

ABSTRACT

The scalability of large language models (LLMs) in handling high-complexity models and large-scale datasets has led to tremendous successes in pivotal domains. While there is an urgent need to acquire more training data for LLMs, a concerning reality is the depletion of high-quality public datasets within a few years. In view of this, the federated learning (FL) LLM fine-tuning paradigm recently has been proposed to facilitate collaborative LLM fine-tuning on distributed private data, where multiple data owners collaboratively fine-tune a shared LLM without sharing raw data. However, the staggering model size of LLMs imposes heavy computing and communication burdens on clients, posing significant barriers to the democratization of the FL LLM fine-tuning paradigm. To address this issue, split learning (SL) has emerged as a promising solution by offloading the primary training workload to a server via model partitioning while exchanging activation/activation's gradients with smaller data sizes rather than the entire LLM. Unfortunately, research on the SL LLM fine-tuning paradigm is still in its nascent stage. To fill this gap, in this paper, we propose the first SL LLM fine-tuning framework, named SplitLoRA. SplitLoRA is built on the split federated learning (SFL) framework, amalgamating the advantages of parallel training from FL and model splitting from SL and thus greatly enhancing the training efficiency. It is worth noting that SplitLoRA is the inaugural open-source benchmark for SL LLM fine-tuning, providing a foundation for research efforts dedicated to advancing SL LLM fine-tuning. Extensive simulations validate that SplitLoRA achieves target accuracy in significantly less time than state-of-the-art LLM fine-tuning frameworks, demonstrating the superior training performance of SplitLoRA. The project page is available at https://fduinc.github.io/splitlora/.

研究动机与目标

  • 通过在分布式私有数据上进行协同LLM微调而不共享原始数据来解决数据稀缺性和隐私问题。
  • 提出 SplitLoRA,这是首个建立在拆分联邦学习和 LoRA 基础上的 SL LLM 微调框架。
  • 证明 SplitLoRA 能在保持竞争性精度的同时提升训练效率并降低客户端计算与通信负担。
  • 提供一个开源的 SL LLM 微分评估基准以促进进一步研究。

提出的方法

  • 将预训练的大语言模型划分为客户端子模型和服务器端子模型,并通过拆分联邦学习(SFL)进行微调。
  • 在客户端和服务器端子模型上使用 LoRA 适配器,以实现参数高效的更新。
  • 每轮进行两阶段训练:拆分微调(客户端前向传播、服务器端前向/反向传播、激活与梯度传输)以及定期的客户端 LoRA 适配器聚合。
  • 每经过 I 轮在本地聚合服务器上聚合客户端 LoRA 适配器,并将聚合后的适配器下发到客户端。
  • 在集中服务器上进行训练,同时保持分布式客户端更新以降低数据传输和内存负载。
  • 在 GPT-2 S/M 的端到端自然语言生成任务上进行评估,以在 BLEU、NIST、METEOR、ROUGE_L、CIDEr 等指标上与 CenLoRA 和 FedLoRA 进行对比。

实验结果

研究问题

  • RQ1SplitLoRA 是否能够在降低客户端计算和通信的同时达到与集中微调和全量FL 相当的收敛精度?
  • RQ2拆分架构和基于 LoRA 的 PEFT 在异构客户端资源条件下如何影响收敛速度和资源效率?
  • RQ3LoRA 的秩(rank)和切分层的选择对性能及数据/计算传输有什么影响?

主要发现

ModelMethodBLEUNISTMETEORROUGE_LCIDEr
GPT2-SCenLoRA (r=1)67.958.69730.442168.962.3412
GPT2-SCenLoRA (r=2)68.498.74810.449168.702.3952
GPT2-SCenLoRA (r=4)69.418.78240.461070.702.4713
GPT2-SCenLoRA (r=8)69.378.77350.462470.962.4572
GPT2-SSplitLoRA (r=1)67.188.66010.441667.712.3255
GPT2-SSplitLoRA (r=2)66.868.56670.451568.502.3358
GPT2-SSplitLoRA (r=4)68.798.72590.457269.842.4411
GPT2-SSplitLoRA (r=8)68.768.69310.458870.172.4165
GPT2-SFedLoRA (r=1)65.668.41230.426567.682.1921
GPT2-SFedLoRA (r=2)67.248.60550.439869.332.3025
GPT2-SFedLoRA (r=4)67.738.61480.449468.592.3817
GPT2-SFedLoRA (r=8)68.398.67450.459070.242.4450
GPT2-MCenLoRA (r=1)69.868.76790.465071.202.5028
GPT2-MCenLoRA (r=2)69.978.77870.466371.562.5029
GPT2-MCenLoRA (r=4)69.788.78200.466771.622.5301
GPT2-MCenLoRA (r=8)70.578.85570.468872.172.5405
GPT2-MSplitLoRA (r=1)70.268.82740.466471.732.5267
GPT2-MSplitLoRA (r=2)70.048.80310.467071.682.5233
GPT2-MSplitLoRA (r=4)70.098.80750.466771.602.5370
GPT2-MSplitLoRA (r=8)69.188.71890.463171.302.5156
GPT2-MFedLoRA (r=1)67.028.64670.448468.062.3431
GPT2-MFedLoRA (r=2)69.648.77270.463371.352.4900
GPT2-MFedLoRA (r=4)69.788.78360.464271.872.4819
GPT2-MFedLoRA (r=8)69.558.73580.466171.462.4980
  • SplitLoRA 在 GPT-2 M 情况下达到与 CenLoRA 相当的收敛精度,且在某些设置下精度差异小于 0.04。
  • FedLoRA 因数据异质性导致困惑度较高(性能较差),PPL 约为 0.08/0.11(GPT2-S/GPT2-M)和相对 SplitLoRA 与 CenLoRA 的 0.73/0.09。
  • 与 CenLoRA/FedLoRA 相比,SplitLoRA 显著减少了客户端可训练参数量(GPT2-S: 0.008M–0.062M;GPT2-M: 0.011M–0.088M),避免在客户端进行整模型微调。
  • SplitLoRA 的收敛速度快于 FedLoRA 和 CenLoRA,达到收敛所需的训练时延在 GPT-S 上大约为 1.7× 与 4.7×,在 GPT-M 上为 2.1× 与 4.8×。
  • 该框架将模型分区,使客户端端的微调仅涉及模型的一部分(GPT-2 S 为四分之一,GPT-2 M 为八分之一),从而能够在消费级显卡上运行。
  • SplitLoRA 的服务器端子模型以集中方式训练,这提高了对数据异质性的鲁棒性,并将大部分工作负载移交给中央服务器。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。