QUICK REVIEW

[논문 리뷰] SplitLoRA: A Split Parameter-Efficient Fine-Tuning Framework for Large Language Models

Lin Zheng, Xuanjie Hu|arXiv (Cornell University)|2024. 07. 01.

Natural Language Processing Techniques인용 수 7

한 줄 요약

SplitLoRA는 분할 학습과 연합학습을 LoRA 기반 매개변수 효율적 미세조정과 결합하여 분산된 개인 데이터에서 대형언어모델(LLM)을 효율적으로 미세조정하고, 계산 및 통신 비용을 낮추면서 정확도는 비슷하게 달성합니다.

ABSTRACT

The scalability of large language models (LLMs) in handling high-complexity models and large-scale datasets has led to tremendous successes in pivotal domains. While there is an urgent need to acquire more training data for LLMs, a concerning reality is the depletion of high-quality public datasets within a few years. In view of this, the federated learning (FL) LLM fine-tuning paradigm recently has been proposed to facilitate collaborative LLM fine-tuning on distributed private data, where multiple data owners collaboratively fine-tune a shared LLM without sharing raw data. However, the staggering model size of LLMs imposes heavy computing and communication burdens on clients, posing significant barriers to the democratization of the FL LLM fine-tuning paradigm. To address this issue, split learning (SL) has emerged as a promising solution by offloading the primary training workload to a server via model partitioning while exchanging activation/activation's gradients with smaller data sizes rather than the entire LLM. Unfortunately, research on the SL LLM fine-tuning paradigm is still in its nascent stage. To fill this gap, in this paper, we propose the first SL LLM fine-tuning framework, named SplitLoRA. SplitLoRA is built on the split federated learning (SFL) framework, amalgamating the advantages of parallel training from FL and model splitting from SL and thus greatly enhancing the training efficiency. It is worth noting that SplitLoRA is the inaugural open-source benchmark for SL LLM fine-tuning, providing a foundation for research efforts dedicated to advancing SL LLM fine-tuning. Extensive simulations validate that SplitLoRA achieves target accuracy in significantly less time than state-of-the-art LLM fine-tuning frameworks, demonstrating the superior training performance of SplitLoRA. The project page is available at https://fduinc.github.io/splitlora/.

연구 동기 및 목표

원시 데이터를 공유하지 않고 분산된 개인 데이터에서 협업형 LLM 미세조정을 가능하게 하여 데이터 부족 및 개인 정보를 해결한다.
SplitLoRA를 제안한다. 분할된 연합 학습과 LoRA에 기반한 최초의 SL LLM 미세조정 프레임워크.
SplitLoRA가 훈련 효율을 높이고 클라이언트 측 계산 및 통신을 줄이면서도 경쟁력 있는 정확도를 유지함을 보여준다.
추가 연구를 촉진하기 위해 SL LLM 미세조정을 위한 오픈 소스 벤치마크를 제공한다.

제안 방법

사전 학습된 LLM을 클라이언트 측 서브모델과 서버 측 서브모델로 분할하고 Split Federated Learning(SFL)을 통해 미세조정한다.
클라이언트 및 서버 측 서브모델 모두에 LoRA 어댑터를 사용하여 매개변수 효율적 업데이트를 달성한다.
라운드당 두 단계 훈련을 수행한다: 분할 미세조정(클라이언트 순전파, 서버 순전파/역전파, 활성화값/그래디언트 전송) 및 주기적 클라이언트 사이드 LoRA 어댑터 집계.
I 라운드마다 로컬 집계 서버에서 클라이언트 사이드 LoRA 어댑터를 집계하고, 집계된 어댑터를 클라이언트로 다운링크한다.
데이터 전송 및 메모리 부하를 줄이기 위해 분산된 클라이언트 측 업데이트를 유지하면서 중앙 집중 서버로 학습한다.
GPT-2 S/M에 대해 E2E NLG 작업으로 CenLoRA 및 FedLoRA와 BLEU, NIST, METEOR, ROUGE_L, CIDEr 등의 지표로 비교 평가한다.

실험 결과

연구 질문

RQ1SplitLoRA가 중앙집중식 미세조정 및 전체 FL에 비해 클라이언트 측 계산 및 통신을 줄이면서 수렴 정확도에 상응하는가?
RQ2분할 구조와 LoRA 기반 PEFT가 이질적인 클라이언트 자원 하에서 수렴 속도와 자원 효율성에 어떤 영향을 미치는가?
RQ3LoRA 순위(rank)와 컷(layer) 선택이 성능 및 데이터/계산 전송에 미치는 영향은 무엇인가?

주요 결과

모델	방법	BLEU	NIST	METEOR	ROUGE_L	CIDEr
GPT2-S	CenLoRA (r=1)	67.95	8.6973	0.4421	68.96	2.3412
GPT2-S	CenLoRA (r=2)	68.49	8.7481	0.4491	68.70	2.3952
GPT2-S	CenLoRA (r=4)	69.41	8.7824	0.4610	70.70	2.4713
GPT2-S	CenLoRA (r=8)	69.37	8.7735	0.4624	70.96	2.4572
GPT2-S	SplitLoRA (r=1)	67.18	8.6601	0.4416	67.71	2.3255
GPT2-S	SplitLoRA (r=2)	66.86	8.5667	0.4515	68.50	2.3358
GPT2-S	SplitLoRA (r=4)	68.79	8.7259	0.4572	69.84	2.4411
GPT2-S	SplitLoRA (r=8)	68.76	8.6931	0.4588	70.17	2.4165
GPT2-S	FedLoRA (r=1)	65.66	8.4123	0.4265	67.68	2.1921
GPT2-S	FedLoRA (r=2)	67.24	8.6055	0.4398	69.33	2.3025
GPT2-S	FedLoRA (r=4)	67.73	8.6148	0.4494	68.59	2.3817
GPT2-S	FedLoRA (r=8)	68.39	8.6745	0.4590	70.24	2.4450
GPT2-M	CenLoRA (r=1)	69.86	8.7679	0.4650	71.20	2.5028
GPT2-M	CenLoRA (r=2)	69.97	8.7787	0.4663	71.56	2.5029
GPT2-M	CenLoRA (r=4)	69.78	8.7820	0.4667	71.62	2.5301
GPT2-M	CenLoRA (r=8)	70.57	8.8557	0.4688	72.17	2.5405
GPT2-M	SplitLoRA (r=1)	70.26	8.8274	0.4664	71.73	2.5267
GPT2-M	SplitLoRA (r=2)	70.04	8.8031	0.4670	71.68	2.5233
GPT2-M	SplitLoRA (r=4)	70.09	8.8075	0.4667	71.60	2.5370
GPT2-M	SplitLoRA (r=8)	69.18	8.7189	0.4631	71.30	2.5156
GPT2-M	FedLoRA (r=1)	67.02	8.6467	0.4484	68.06	2.3431
GPT2-M	FedLoRA (r=2)	69.64	8.7727	0.4633	71.35	2.4900
GPT2-M	FedLoRA (r=4)	69.78	8.7836	0.4642	71.87	2.4819
GPT2-M	FedLoRA (r=8)	69.55	8.7358	0.4661	71.46	2.4980

SplitLoRA는 CenLoRA에 비해 특히 GPT-2 M에서 수렴한 정확도가 비슷하며, 특정 설정에서 정확도 차이는 0.04 미만이다.
FedLoRA는 데이터 이질성으로 인해 더 높은 언어 모델 perplexity를 보이며 성능이 낮다. PPL ≈ 0.08/0.11(GPT2-S/GPT2-M) 및 0.73/0.09가 SplitLoRA 및 CenLoRA에 비해 나타난다.
SplitLoRA는 클라이언트 측에서 학습 가능한 매개변수를 현저히 감소시키는데, GPT2-S: 0.008M–0.062M; GPT2-M: 0.011M–0.088M로 CenLoRA/FedLoRA에 비해 필요 최소한의 모델만 미세조정하게 된다.
SplitLoRA는 FedLoRA 및 CenLoRA보다 빠르게 수렴하며, GPT-S의 경우 학습 지연은 대략 1.7×, 4.7×의 차이로 수렴에 도달하고, GPT-M의 경우 각각 2.1×, 4.8×의 차이로 수렴에 도달한다.
모델을 분할하여 클라이언트 측 미세조정이 모델의 일부(설정에서 GPT-2 S의 4분의 1, GPT-2 M의 8분의 1)만 포함되도록 하여 소비자용 GPU에서 동작 가능하게 한다.
SplitLoRA의 서버 측 서브모델은 중앙집중 방식으로 학습되어 데이터 이질성에 대한 강건성을 높이고 대부분의 작업 부하를 중앙 서버로 이전한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.