QUICK REVIEW

[논문 리뷰] Train Less, Infer Faster: Efficient Model Finetuning and Compression via Structured Sparsity

Jonathan Svirsky, Yehonathan Refael|arXiv (Cornell University)|2026. 02. 09.

Topic Modeling인용 수 0

한 줄 요약

논문은 FineGates를 제안하는데, 이 방법은 이진 행/열 게이트를 학습하여 기본 파라미터의 최대 40%를 비활성화하고, 최소한의 정확도 손실로 추론 속도를 개선하는 구조적 희소화 기반의 언어 모델 미세조정 방법이다.

ABSTRACT

Fully finetuning foundation language models (LMs) with billions of parameters is often impractical due to high computational costs, memory requirements, and the risk of overfitting. Although methods like low-rank adapters help address these challenges by adding small trainable modules to the frozen LM, they also increase memory usage and do not reduce inference latency. We uncover an intriguing phenomenon: sparsifying specific model rows and columns enables efficient task adaptation without requiring weight tuning. We propose a scheme for effective finetuning via sparsification using training stochastic gates, which requires minimal trainable parameters, reduces inference time, and removes 20--40\% of model parameters without significant accuracy loss. Empirical results show it outperforms recent finetuning baselines in efficiency and performance. Additionally, we provide theoretical guarantees for the convergence of this stochastic gating process, and show that our method admits a simpler and better-conditioned optimization landscape compared to LoRA. Our results highlight sparsity as a compelling mechanism for task-specific adaptation in LMs.

연구 동기 및 목표

Foundation 언어 모델의 전체 가중치 업데이트 없이 더 효율적인 미세조정의 필요성에 대한 동기 부여.
가중치 행렬의 행/열을 비활성화하도록 이진 게이트를 학습하는 희소화 기반 접근 방식 제안.
작업 성능을 유지하면서 추론 시 속도 향상과 모델 압축 달성.
수렴에 대한 이론적 보장 제공 및 최적화 지형을 LoRA와 비교.
사전학습 시나리오 및 하위 데이터 규모에서도 적용 가능성을 실증 결과로 제시.

제안 방법

FineGates를 소개: 구조적으로 기본 모델 가중치를 희소화하는 학습 가능한 행 및 열 게이트 벡터.
게이트를 이진 게이트에 근사하기 위한 Gaussian 기반 이완 및 재매개화 기법을 사용한 확률적 게이트 표현.
작업 손실과 표적 희소화를 촉진하는 구조적 희소성 규제항(l0 기반 항목, 희소성 목표 포함)을 결합한 목표를 최적화.
W에 Diag(omega_r) 및 Diag(omega_c)를 곱해 Transformer 기반 모델의 모든 적용된 층에 게이트를 적용.
LoRA보다 더 간단하고 양호한 조건 수렴 가능성을 보여주는 이론적 분석 및 게이팅 최적화의 수렴 보장.

Figure 1: CPU inference time reduction (%) and number of removed parameters on the MRPC validation set while finetuning our method on the Llama3.2-1B backbone. See Section 6.6 for details.

실험 결과

연구 질문

RQ1구조적 희소화가 게이트를 통한 미세조정을 통해 전체 미세조정 또는 LoRA 기반 방법과 비교해 파라미터를 줄이면서도 비슷하거나 더 나은 성능을 낼 수 있는가?
RQ2다양한 백본 및 과제에서 학습된 게이트가 추론 속도 및 모델 크기에 미치는 영향은 무엇인가?
RQ3제안된 게이트가 기존 PEFT 방법과 비교해 수렴 보장과 유리한 최적화 지형을 제공하는가?
RQ4FineGates가 사전학습 중 및 제한된 데이터 구간에서도 정확도를 해치지 않으면서 효과적인 가지치기를 가능하게 하는가?

주요 결과

방법	TP	CoLA	STS-B	MRPC	RTE	SST2	MNLI	QNLI	QQP	평균
RoBERTa-Base Full Finetune	125M	63.6	90.9	90.2	80.5	92.8	81.4	87.7	85.2	86.5
RoBERTa-Base Galore	125M	60.3	90.7	92.2	79.4	94.0	87.0	92.2	91.1	85.9
LoRA(r=4)	0.7M	64.0	90.9	89.7	83.4	94.4	87.6	92.7	91.0	86.6
BitFit	0.11M	61.8	90.8	92.0	77.8	93.7	85.2	91.3	84.5	84.6
VeRA	0.04M	65.6	90.7	89.5	78.7	94.6	-	91.8	-	85.2
RoCoFT 1-Row	0.08M	60.2	90.7	87.7	76.6	94.1	85.2	90.7	88.5	84.2
VeLoRA	0.16M	64.6	90.8	91.3	78.0	94.4	86.3	92.1	89.9	85.9
FineGates	0.17M	65.7	91.0	90.2	83.4	94.7	85.8	92.3	89.2	86.6
RoBERTa-Large Full Finetune	355M	68.0	92.3	90.9	86.6	96.4	90.2	94.7	92.2	88.9
LoRA(r=4)	1.8M	71.0	92.3	90.7	89.5	96.4	90.4	94.8	91.7	89.3
LoRA-XS	0.06K	68.5	92.2	91.2	89.5	96.3	-	94.3	-	88.7
VeRA	0.06M	68.0	91.7	90.9	85.9	96.1	-	94.4	-	87.8
RoCoFT 1-Row	0.22M	65.7	91.8	90.0	85.3	96.6	90.7	94.2	90.2	88.1
VeLoRA	0.16M	68.0	91.7	90.9	85.9	96.1	-	94.4	-	87.8
FineGates	0.4M	71.4	92.3	91.2	90.2	96.0	89.1	94.1	89.4	89.2

FineGates는 RoBERTa 백본으로 GLUE에서 전체 미세조정 및 다른 효율적 미세조정 기준과 비교 가능하거나 더 나은 성능을 달성하며 RoBERTa-Large에서 이득을 보임.
여러 태스크에서 주의(attention) 층의 기본 모델 파라미터의 최대 40%를 비활성화하더라도 소수의 정확도 손실로 가능함.
1B-Llama 백본을 사용한 CPU에서 추론 속도Up 최대 25%를 시연하였으며, 정확도 손실은 완화된 수준.
LoRA에 비해 FineGates는 더 간단한 최적화 지형과 표준 매끄러움/PL 가정하에 수렴 보장을 제공.
사전학습 및 대규모 실험에서 FineGates는 의미 있는 압축(최대 40%)과 희소화 모델의 혼합된 perplexity 감소를 달성함.

Figure 2: Overview of FineGates: Our method introduces structured sparsity in LM finetuning by training lightweight row and column gating vectors ( $\bm{\omega}_{c},\bm{\omega}_{r}$ ). These gates selectively retain the most informative weight dimensions, enabling efficient adaptation without modify

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.