QUICK REVIEW

[論文レビュー] Train Less, Infer Faster: Efficient Model Finetuning and Compression via Structured Sparsity

Jonathan Svirsky, Yehonathan Refael|arXiv (Cornell University)|Feb 9, 2026

Topic Modeling被引用数 0

ひとこと要約

The paper introduces FineGates, a structured sparsification method that finetunes language models by learning binary row/column gates to deactivate up to 40% of base parameters, improving inference speed with minimal accuracy loss.

ABSTRACT

Fully finetuning foundation language models (LMs) with billions of parameters is often impractical due to high computational costs, memory requirements, and the risk of overfitting. Although methods like low-rank adapters help address these challenges by adding small trainable modules to the frozen LM, they also increase memory usage and do not reduce inference latency. We uncover an intriguing phenomenon: sparsifying specific model rows and columns enables efficient task adaptation without requiring weight tuning. We propose a scheme for effective finetuning via sparsification using training stochastic gates, which requires minimal trainable parameters, reduces inference time, and removes 20--40\% of model parameters without significant accuracy loss. Empirical results show it outperforms recent finetuning baselines in efficiency and performance. Additionally, we provide theoretical guarantees for the convergence of this stochastic gating process, and show that our method admits a simpler and better-conditioned optimization landscape compared to LoRA. Our results highlight sparsity as a compelling mechanism for task-specific adaptation in LMs.

研究の動機と目的

Foundation言語モデルの全重量更新を伴わず、より効率的なファインチューニングの必要性を動機づける。
重み行列の行/列を非活性化するバイナリゲートを学習するスパース化ベースのアプローチを提案する。
推論時のスピードアップとモデル圧縮を実現しつつ、タスク性能を維持する。
収束性に関する理論的保証を提供し、LoRA との最適化風景を比較する。
事前学習シナリオおよび下流タスクへの適用性を実証的結果で示す。

提案手法

FineGatesを導入する：ベースモデルの重みを構造的にスパース化する学習可能な行ゲートと列ゲートベクトル。
確率的ゲート表現を用いてバイナリゲートをガウスベースのリラクゼーションと再パラメータ化トリックで近似する。
タスク損失と構造的スパース正則化を組み合わせた目的関数を最適化し、ターゲットとするスパース性を促進する（sparsity targetsを用いたl0ベース項）。
Transformer系モデルの全適応層に対してWをDiag(omega_r)とDiag(omega_c)で乗算することでゲートを適用する。
LoRAよりも単純で良好な条件づけされた最適化風景とゲーティング最適化の収束保証を示す理論分析を提供する。

Figure 1: CPU inference time reduction (%) and number of removed parameters on the MRPC validation set while finetuning our method on the Llama3.2-1B backbone. See Section 6.6 for details.

実験結果

リサーチクエスチョン

RQ1ゲーティングを用いた構造化スパース性は、全ファインチューニングやLoRAベースの手法と比べて、 trainable parametersを削減しつつ同等またはそれ以上のファインチューニング性能を達成できるか。
RQ2学習されたゲートが様々なバックボーンやタスクで推論速度とモデルサイズに与える影響はどうなるか。
RQ3提案ゲートは収束保証と有利な最適化風景を提供するか。既存のPEFT手法と比較してどうか。
RQ4FineGatesは事前学習中およびデータが限られた状況で効果的な剪定を実現し、精度を損なわないか。

主な発見

Method	TP	CoLA	STS-B	MRPC	RTE	SST2	MNLI	QNLI	QQP	Avg.
RoBERTa-Base Full Finetune	125M	63.6	90.9	90.2	80.5	92.8	81.4	87.7	85.2	86.5
RoBERTa-Base Galore	125M	60.3	90.7	92.2	79.4	94.0	87.0	92.2	91.1	85.9
LoRA(r=4)	0.7M	64.0	90.9	89.7	83.4	94.4	87.6	92.7	91.0	86.6
BitFit	0.11M	61.8	90.8	92.0	77.8	93.7	85.2	91.3	84.5	84.6
VeRA	0.04M	65.6	90.7	89.5	78.7	94.6	-	91.8	-	85.2
RoCoFT 1-Row	0.08M	60.2	90.7	87.7	76.6	94.1	85.2	90.7	88.5	84.2
VeLoRA	0.16M	64.6	90.8	91.3	78.0	94.4	86.3	92.1	89.9	85.9
FineGates	0.17M	65.7	91.0	90.2	83.4	94.7	85.8	92.3	89.2	86.6
RoBERTa-Large Full Finetune	355M	68.0	92.3	90.9	86.6	96.4	90.2	94.7	92.2	88.9
LoRA(r=4)	1.8M	71.0	92.3	90.7	89.5	96.4	90.4	94.8	91.7	89.3
LoRA-XS	0.06K	68.5	92.2	91.2	89.5	96.3	-	94.3	-	88.7
VeRA	0.06M	68.0	91.7	90.9	85.9	96.1	-	94.4	-	87.8
RoCoFT 1-Row	0.22M	65.7	91.8	90.0	85.3	96.6	90.7	94.2	90.2	88.1
VeLoRA	0.16M	68.0	91.7	90.9	85.9	96.1	-	94.4	-	87.8
FineGates	0.4M	71.4	92.3	91.2	90.2	96.0	89.1	94.1	89.4	89.2

FineGatesは RoBERTa バックボーンを用いた GLUE で全ファインチューニングや他の効率的ファインチューニングベースラインと同等またはそれ以上の性能を達成し、RoBERTa-Large での利得を示す。
いくつかのタスクで、注意機構内のパラメータの最大40%を非活性化しても、精度の損失は最小限にとどまる。
1B-Llama バックボーンを用いたCPUでの推論速度が最大25%向上することを、妥当な精度トレードオフとともに実証。
LoRAと比較して、FineGatesはより単純な最適化風景と標準的な滑らかさ/PL仮定の下での収束保証を提供。
事前学習およびより大規模な実験では、FineGatesは意味のある圧縮（最大40%）を実現し、スパース化モデルでの perplexity の低下が競争力を持つ。

Figure 2: Overview of FineGates: Our method introduces structured sparsity in LM finetuning by training lightweight row and column gating vectors ( $\bm{\omega}_{c},\bm{\omega}_{r}$ ). These gates selectively retain the most informative weight dimensions, enabling efficient adaptation without modify

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。