[论文解读] Parameter-Efficient Fine-Tuning with Discrete Fourier Transform
FourierFT 提出通过仅更新通过逆离散傅里叶变换获得的权重变更矩阵的稀疏光谱系数来对大型基础模型进行微调,在比 LoRA 少得多的可训练参数下,在 NLP 和 CV 任务上实现可比或更好的性能。
Low-rank adaptation~(LoRA) has recently gained much interest in fine-tuning foundation models. It effectively reduces the number of trainable parameters by incorporating low-rank matrices $A$ and $B$ to represent the weight change, i.e., $ΔW=BA$. Despite LoRA's progress, it faces storage challenges when handling extensive customization adaptations or larger base models. In this work, we aim to further compress trainable parameters by enjoying the powerful expressiveness of the Fourier transform. Specifically, we introduce FourierFT, which treats $ΔW$ as a matrix in the spatial domain and learns only a small fraction of its spectral coefficients. With the trained spectral coefficients, we implement the inverse discrete Fourier transform to recover $ΔW$. Empirically, our FourierFT method shows comparable or better performance with fewer parameters than LoRA on various tasks, including natural language understanding, natural language generation, instruction tuning, and image classification. For example, when performing instruction tuning on the LLaMA2-7B model, FourierFT surpasses LoRA with only 0.064M trainable parameters, compared to LoRA's 33.5M. Our code is released at \url{https://github.com/Chaos96/fourierft}.
研究动机与目标
- Motivate reduced storage for fine-tuning large foundation models (LFMs) amid growing customization needs.
- Introduce FourierFT to represent weight changes via sparse spectral coefficients learned once per layer.
- Demonstrate that spectral-based parameterization achieves competitive accuracy with substantially fewer trainable parameters than LoRA across NLP and CV benchmarks.
提出的方法
- Treats the weight change Delta W as a matrix in the spatial domain.
- Learns a small set of spectral coefficients located at randomly selected entries shared across layers.
- Constructs a dense spectral matrix from a fixed entry matrix E and trainable coefficients c, then computes Delta W via inverse discrete Fourier transform and takes the real part.
- Merges Delta W with the pre-trained weight W0 during forward passes to avoid inference latency.
- Uses a scaling factor alpha to modulate the influence of the recovered Delta W.
- Provides PyTorch-style pseudocode for the FourierFT module and discusses a frequency-bias initialization option for spectral entries.
实验结果
研究问题
- RQ1Can extremely sparse spectral coefficients suffice to capture beneficial weight updates for fine-tuning LFMs?
- RQ2How does FourierFT compare to LoRA and full fine-tuning in terms of performance and parameter efficiency across NLP and CV tasks?
- RQ3What is the impact of spectral entry selection bias (frequency bias) on performance?
- RQ4How does FourierFT scale with model size and layer depth relative to traditional PEFT methods?
主要发现
- FourierFT achieves comparable or better performance than LoRA with significantly fewer trainable parameters across NLP and CV tasks.
- On instruction tuning with LLaMA-family models, FourierFT can outperform LoRA with as few as 0.064M trainable parameters versus LoRA’s 33.5M.
- FourierFT attains similar scores to Full Fine-tuning with only 0.128M parameters in some setups.
- In image classification with ViT bases, FourierFT reaches competitive accuracy with 12.4% (ViT Base) and 9.2% (ViT Large) of LoRA’s parameter count, and can even exceed LoRA with higher shares of LoRA parameters.
- Across GLUE, E2E, instruction tuning, and CV benchmarks, FourierFT consistently uses far fewer trainable parameters while delivering on-par or superior results.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。