QUICK REVIEW

[论文解读] LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models

Yixiao Li, Yifan Yu|arXiv (Cornell University)|Oct 12, 2023

Topic Modeling被引用 18

一句话总结

LoftQ 在学习低秩 LoRA 初始化的同时对大语言模型进行联合量化，以最小化量化差异，从而在低位比特水平上比 QLoRA 在下游微调上表现更好。

ABSTRACT

Quantization is an indispensable technique for serving Large Language Models (LLMs) and has recently found its way into LoRA fine-tuning. In this work we focus on the scenario where quantization and LoRA fine-tuning are applied together on a pre-trained model. In such cases it is common to observe a consistent gap in the performance on downstream tasks between full fine-tuning and quantization plus LoRA fine-tuning approach. In response, we propose LoftQ (LoRA-Fine-Tuning-aware Quantization), a novel quantization framework that simultaneously quantizes an LLM and finds a proper low-rank initialization for LoRA fine-tuning. Such an initialization alleviates the discrepancy between the quantized and full-precision model and significantly improves generalization in downstream tasks. We evaluate our method on natural language understanding, question answering, summarization, and natural language generation tasks. Experiments show that our method is highly effective and outperforms existing quantization methods, especially in the challenging 2-bit and 2/4-bit mixed precision regimes. The code is available on https://github.com/yxli2123/LoftQ.

研究动机与目标

通过量化和 LoRA 微调来实现资源有限的高效部署大语言模型。
通过使量化权重与 LoRA 适配器对齐，解决全量微调与量化加 LoRA 之间的性能差距。
提出一种方法，使原始高精度权重及其 LoftQ 表示之间的差异最小化，从而提升下游泛化能力。
提供一个兼容多种量化方案的量化框架，并在 NLU、QA、摘要和生成任务中进行验证。

提出的方法

提出 LoftQ，通过交替进行量化和低秩近似来近似高精度权重。
在最小化 ||W - Q - AB^T||_F 中优化 Q、A、B 以初始化 LoRA 适配器。
使用交替循环：对残差 (W - AB^T) 进行量化以获得 Q；然后对残差进行秩 r 的 SVD 以更新 A 和 B。
在 q_N(·) 中支持不同的量化器（NF4、NF2 和 Uniform）。
在经过 T 轮后，存储 Q_T，并将前向传播的去量化基于查找表进行，并用 A_T、B_T 初始化 LoRA 适配器。

实验结果

研究问题

RQ1LoftQ 是否能降低量化主干网络与全精度权重之间的初始化差异，从而改善 LoRA 微调？
RQ2在 2 位和 4 位量化下，LoftQ 相对于 QLoRA 在仅编码器、编码器-解码器和仅解码器模型上的表现如何？
RQ3LoftQ 在 NLU、QA、摘要和生成任务中是否具有鲁棒性，且在低位或混合精度设置下是否也表现良好？

主要发现

LoftQ 在测试的模型、量化方案、秩和任务上均显著优于 QLoRA。
在 2 位量化下，LoftQ 实现收敛并在 QLoRA 无法收敛的若干任务上取得显著提升（如 CoLA）。
在 DeBERTaV3-base 的 2 位 Uniform NF2 与 NF4 实验中，LoftQ 获得更高的 MNLI 准确率，且在 SQuADv1.1 上表现与 QLoRA 相当或更好。
对于 BART-large，LoftQ 在 4 位下常常超越 full-precision LoRA 在 XSum 上的表现，并在各秩上接近或超过 CNN/DailyMail 的结果。
在 WikiText-2 和 GSM8K 的 LLAMA-2-7b/13b 上，LoftQ 能获得更好的困惑度和 GSM8K 准确性，包括 QLoRA 在 2 位时出现的不收敛情况。
总体而言，LoftQ 在低比特条件下表现出强劲的性能，并为 LoRA 微调提供了鲁棒的初始化。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。