QUICK REVIEW

[논문 리뷰] AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

Qingru Zhang, Minshuo Chen|arXiv (Cornell University)|2023. 03. 18.

Topic Modeling인용 수 32

한 줄 요약

AdaLoRA은 LoRA 스타일 미세조정 중 가중 행렬 전체에 걸쳐 파라미터 예산을 적응적으로 할당하며, 업데이트를 SVD 유사 형태와 중요도-가이드 랭크 스케줄러로 매개화하여, 특히 매우 낮은 예산에서 이득을 얻는다.

ABSTRACT

Fine-tuning large pre-trained language models on downstream tasks has become an important paradigm in NLP. However, common practice fine-tunes all of the parameters in a pre-trained model, which becomes prohibitive when a large number of downstream tasks are present. Therefore, many fine-tuning methods are proposed to learn incremental updates of pre-trained weights in a parameter efficient way, e.g., low-rank increments. These methods often evenly distribute the budget of incremental updates across all pre-trained weight matrices, and overlook the varying importance of different weight parameters. As a consequence, the fine-tuning performance is suboptimal. To bridge this gap, we propose AdaLoRA, which adaptively allocates the parameter budget among weight matrices according to their importance score. In particular, AdaLoRA parameterizes the incremental updates in the form of singular value decomposition. Such a novel approach allows us to effectively prune the singular values of unimportant updates, which is essentially to reduce their parameter budget but circumvent intensive exact SVD computations. We conduct extensive experiments with several pre-trained models on natural language processing, question answering, and natural language generation to validate the effectiveness of AdaLoRA. Results demonstrate that AdaLoRA manifests notable improvement over baselines, especially in the low budget settings. Our code is publicly available at https://github.com/QingruZhang/AdaLoRA .

연구 동기 및 목표

다양한 태스크에 걸친 대형 사전 학습 언어 모델의 미세조정에서 메모리와 계산 요구를 줄이는 동기를 제시한다.
중요한 모듈에 더 많은 파라미터를 할당하는 예산 적응형 미세조정 방법을 제안한다.
비용이 큰 정확한 SVD 계산을 우회하기 위한 SVD 기반의 점진적 업데이트 공식을 개발한다.
회복 가능성을 보존하면서 특이값을 잘라내는 중요도 인식 랭킹 메커니즘을 도입한다.

제안 방법

가중치 업데이트를 W = W(0) + P Λ Q로 매개화하며, Λ는 특이값을 포함하고 P, Q는 좌/우 특이 벡터이다.
P와 Q를 정규화하여 직교성을 촉진하고 학습을 안정화한다.
특이값과 P 및 Q 열/행의 평균 크기를 기반으로 하는 중요도 점수 S를 정의하여 Λ의 가지치기를 안내한다.
학습 중에 더 높은 예산으로 시작해 점차 목표 예산으로 축소하는 전역 예산 스케줄러를 적용한다.
계산된 중요도 점수를 사용하여 선택된 주기에 따라 Λ를 반복적으로 가지치면서도 가지치된 구성 요소를 회복할 수 있는 능력을 유지한다.

실험 결과

연구 질문

RQ1적응적이고 중요도 가이드된 랭크 할당이 균일한 저랭크 업데이트에 비해 파라미터 효율적 미세조정 성능을 향상시킬 수 있는가?
RQ2SVD 기반 매개화가 LoRA 유사 설정에서 직접 SVD나 구조적 가지치기에 비해 더 저렴하거나 더 안정적인 가지치를 가능하게 하는가?
RQ3다양한 예산 수준에서 AdaLoRA가 NLP 작업(NLU, QA, NLG)에서 베이스라인과 비교해 어떤 성능을 보이는가?

주요 결과

AdaLoRA은 낮은 예산에서 특히 GLUE, SQuAD, NLG 벤치마크에서 베이스라인을 능가한다.
학습 가능한 파라미터가 0.1% 미만일 때, AdaLoRA는 SQuAD2.0에서 최첨단 접근법 대비 1.2% F1 향상을 달성한다.
예산이 축소될 때도 AdaLoRA는 성능을 유지하거나 향상시키며, 파라미터 예산이 촘촘해도 강건함을 보인다.
AdaLoRA를 통한 특이값 가지치기는 특이 벡터를 0으로 만들지 않음으로써 회복 가능성을 보존하고 안정성에 도움을 준다.
전역 예산 스케줄러가 학습의 안정화 및 파라미터 예산을 점진적으로 조여 성능을 향상시키는 데 기여한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.