QUICK REVIEW

[论文解读] Train Less, Infer Faster: Efficient Model Finetuning and Compression via Structured Sparsity

Jonathan Svirsky, Yehonathan Refael|arXiv (Cornell University)|Feb 9, 2026

Topic Modeling被引用 0

一句话总结

本文提出 FineGates，一种通过学习二进制行/列门控来实现结构化稀疏化的语言模型微调方法，在不超过40%的基础参数被禁用的前提下，提升推理速度并最小化精度损失。

ABSTRACT

Fully finetuning foundation language models (LMs) with billions of parameters is often impractical due to high computational costs, memory requirements, and the risk of overfitting. Although methods like low-rank adapters help address these challenges by adding small trainable modules to the frozen LM, they also increase memory usage and do not reduce inference latency. We uncover an intriguing phenomenon: sparsifying specific model rows and columns enables efficient task adaptation without requiring weight tuning. We propose a scheme for effective finetuning via sparsification using training stochastic gates, which requires minimal trainable parameters, reduces inference time, and removes 20--40\% of model parameters without significant accuracy loss. Empirical results show it outperforms recent finetuning baselines in efficiency and performance. Additionally, we provide theoretical guarantees for the convergence of this stochastic gating process, and show that our method admits a simpler and better-conditioned optimization landscape compared to LoRA. Our results highlight sparsity as a compelling mechanism for task-specific adaptation in LMs.

研究动机与目标

为 foundation language model 的高效微调提供必要性论证，避免对所有权重进行更新。
提出一种基于稀疏化的方法，通过学习二进制门来禁用权重矩阵中的行/列。
在推理时实现加速与模型压缩，同时尽量保持任务性能。
对收敛性给出理论保证，并将优化景观与 LoRA 进行对比。
在下游任务和预训练情景中给出经验结果，证明方法的适用性。

提出的方法

引入 FineGates：可学习的行向量和列向量门控，以结构化方式稀疏化基础模型权重。
使用随机门表示，通过高斯基松弛和重参数化技巧近似二进制门。
优化一个将任务损失与结构化稀疏正则化相结合的目标函数，促使实现有目标的稀疏性（包含基于 l0 的项与稀疏性目标）。
对 Transformer 基模型的所有已适配层应用门控，通过将 W 乘以 Diag(omega_r) 和 Diag(omega_c) 来实现。
给出理论分析，显示相比 LoRA 拥有更简单、条件更良好的优化景观，以及对门控优化的收敛性保证。

Figure 1: CPU inference time reduction (%) and number of removed parameters on the MRPC validation set while finetuning our method on the Llama3.2-1B backbone. See Section 6.6 for details.

实验结果

研究问题

RQ1结构化稀疏性通过门控是否能够在减少可训练参数的同时达到与全微调或基于 LoRA 的方法相当甚至更好的微调性能？
RQ2在不同骨干网络与任务中，学习到的门控对推理速度和模型大小的影响如何？
RQ3相较于现有的 PEFT 方法，所提门控是否提供收敛性保证并具有有利的优化景观？
RQ4FineGates 是否能够在预训练阶段和数据受限情形下实现有效裁剪而不损害准确性？

主要发现

Method	TP	CoLA	STS-B	MRPC	RTE	SST2	MNLI	QNLI	QQP	Avg.
RoBERTa-Base Full Finetune	125M	63.6	90.9	90.2	80.5	92.8	81.4	87.7	85.2	86.5
RoBERTa-Base Galore	125M	60.3	90.7	92.2	79.4	94.0	87.0	92.2	91.1	85.9
LoRA(r=4)	0.7M	64.0	90.9	89.7	83.4	94.4	87.6	92.7	91.0	86.6
BitFit	0.11M	61.8	90.8	92.0	77.8	93.7	85.2	91.3	84.5	84.6
VeRA	0.04M	65.6	90.7	89.5	78.7	94.6	-	91.8	-	85.2
RoCoFT 1-Row	0.08M	60.2	90.7	87.7	76.6	94.1	85.2	90.7	88.5	84.2
VeLoRA	0.16M	64.6	90.8	91.3	78.0	94.4	86.3	92.1	89.9	85.9
FineGates	0.17M	65.7	91.0	90.2	83.4	94.7	85.8	92.3	89.2	86.6
RoBERTa-Large Full Finetune	355M	68.0	92.3	90.9	86.6	96.4	90.2	94.7	92.2	88.9
LoRA(r=4)	1.8M	71.0	92.3	90.7	89.5	96.4	90.4	94.8	91.7	89.3
LoRA-XS	0.06K	68.5	92.2	91.2	89.5	96.3	-	94.3	-	88.7
VeRA	0.06M	68.0	91.7	90.9	85.9	96.1	-	94.4	-	87.8
RoCoFT 1-Row	0.22M	65.7	91.8	90.0	85.3	96.6	90.7	94.2	90.2	88.1
VeLoRA	0.16M	68.0	91.7	90.9	85.9	96.1	-	94.4	-	87.8
FineGates	0.4M	71.4	92.3	91.2	90.2	96.0	89.1	94.1	89.4	89.2

FineGates 在 GLUE 上使用 RoBERTa 作为骨干并在 RoBERTa-Large 上显示出接近甚至优于全微调及其他高效微调基线的表现。
在注意力层中最多可停用高达 40% 的基础模型参数，而在多项任务上精度损失很小。
在 CPU 上以 1B-Llama 骨干实现推理速度提升，最高达到约 25%，并伴随适度的精度权衡。
相较于 LoRA，FineGates 提供更简单的优化景观和在标准平滑性/PL 假设下的收敛性保证。
在预训练和更大规模的实验中，FineGates 能实现有意义的压缩（最高约 40%），并在裁剪模型上取得有竞争力的困惑度下降。

Figure 2: Overview of FineGates: Our method introduces structured sparsity in LM finetuning by training lightweight row and column gating vectors ( $\bm{\omega}_{c},\bm{\omega}_{r}$ ). These gates selectively retain the most informative weight dimensions, enabling efficient adaptation without modify

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。