QUICK REVIEW

[论文解读] LoRA Dropout as a Sparsity Regularizer for Overfitting Control

Lin Yang, Xinyu Ma|arXiv (Cornell University)|Apr 15, 2024

Sparse and Compressive Sensing Techniques被引用 7

一句话总结

本文提出 LoRA Dropout，一种基于稀疏性的正则化方法，用于 LoRA 基础的参数高效微调，包含理论分析和测试时集成，以提升在多种 NLP 任务上的泛化能力和校准性。

ABSTRACT

Parameter-efficient fine-tuning methods, represented by LoRA, play an essential role in adapting large-scale pre-trained models to downstream tasks. However, fine-tuning LoRA-series models also faces the risk of overfitting on the training dataset, and yet there's still a lack of theoretical guidance and practical mechanism to control overfitting on LoRA-based PEFT methods. In this paper, we propose a LoRA Dropout mechanism for the LoRA-based methods by introducing random noises to the learnable low-rank matrices and increasing parameter sparsity. We then demonstrate the theoretical mechanism of our LoRA Dropout mechanism from the perspective of sparsity regularization by providing a generalization error bound under this framework. Theoretical results show that appropriate sparsity would help tighten the gap between empirical and generalization risks and thereby control overfitting. Furthermore, based on the LoRA Dropout framework, we introduce a test-time ensemble strategy and provide theoretical evidence demonstrating that the ensemble method can further compress the error bound, and lead to better performance during inference time. Extensive experiments on various NLP tasks provide practical validations of the effectiveness of our LoRA Dropout framework in improving model accuracy and calibration.

研究动机与目标

Motivate and address overfitting in LoRA- and AdaLoRA-based fine-tuning of large pre-trained language models.
Develop a dropout mechanism that induces sparsity in learnable LoRA matrices without reducing expressive power.
Provide theoretical generalization guarantees under sparsity regularization for LoRA Dropout.
Propose a test-time ensemble strategy to further improve inference performance and generalization.
Empirically validate the framework on diverse NLP tasks, including GLUE, SQuAD, and instruction tuning.

提出的方法

Apply dropout to the input/output dimensions of LoRA low-rank matrices by masking rows/columns with Bernoulli(p) draws.
Express the forward pass as h = W0x + Ãx with Ã = B A, and dropout-applied versions Ã̂ = B̂ Â̂, where Â and B̂ are masked versions of A and B.
Train with an objective that averages losses over N dropout instances: L(x) = (1/N) Σr ℓ(x; θ0 + Δθ(mr)).
Adapt the dropout framework to AdaLoRA by masking its P and Q matrices while keeping the diagonal Λ fixed during dropout.
Develop a sparsity-regularized optimization view that leads to a generalization error bound (Theorem 4.4).
Introduce a test-time ensemble by averaging outputs across dropout-instantiated models, yielding a tighter error bound (Theorem 4.5).

实验结果

研究问题

RQ1Can LoRA-based fine-tuning be regularized via dropout to balance empirical risk and model complexity?
RQ2Does sparsity regularization induced by LoRA Dropout improve generalization and calibration for PEFT methods?
RQ3Can a test-time dropout ensemble further reduce generalization error and improve performance?
RQ4How does LoRA Dropout interact with AdaLoRA and other LoRA variants in practice across NLP tasks?

主要发现

LoRA Dropout provides a sparsity-regularized fine-tuning objective that can reduce overfitting in LoRA-based methods.
Theoretical analysis yields a generalization error bound showing the trade-off between empirical risk and model complexity controlled by dropout rate p and sparsity strength λ.
A test-time ensemble of dropout-instantiated models further tightens the error bound and improves test-time generalization.
Empirical results across GLUE tasks, SQuAD, and instruction tuning show LoRA Dropout consistently improves accuracy and calibration over baseline LoRA and AdaLoRA methods.
LoRA Dropout reduces expected calibration error (ECE) on several GLUE tasks, indicating better-calibrated predictions.
Across SQuAD v1.1/v2.0 and instruction tuning, LoRA Dropout attains higher EM/F1 and MMLU 0-shot accuracy, respectively, than non-dropout baselines.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。