[論文レビュー] Sparsity-Aware Evolution for Model Merging
The paper introduces a sparsity-aware evolutionary framework (SAE) for model merging that integrates pruning and sparsity-driven signals into the fitness function, improving reliability and modularity of merged LLMs. It demonstrates consistent gains over strong baselines on GSM8K and MMLU-ProX across multiple architectural scales.
We propose a sparsity-aware evolutionary (SAE) framework for model merging that involves iterative pruning-merging cycles to act as a novel mutation operator. We incorporate the sparsity constraints into the score function, which steers the evolutionary process to favor more sparse models, in addition to other conventional performance scores. Interestingly, the by-product of \textit{competition} for sparsity introduces an extra local \textit{attraction} and interplay into the evolutionary process: if one competitor has more zero elements, the other competitor's non-zero elements will occupy those positions, even though the less sparse competitor loses to the more sparse competitor in other positions. The proposed pipeline is evaluated on a variety of large-scale LLM benchmarks. Experiments demonstrate that our approach can improve model merging reliability across multiple benchmarks, and is easy to incorporate due to its simplicity and being orthogonal to most existing approaches.
研究の動機と目的
- Motivate and enable reliable merging of multiple pretrained models without retraining.
- Incorporate sparsity as an active regulatory signal in an evolutionary merging framework.
- Develop a pruning–re-densification cycle to create modular, conflict-free subnetworks.
提案手法
- Adopt an evolutionary model merging framework that searches the merging space via population-based pruning and recombination.
- Compute layer-wise mixing ratios that incorporate both performance scores and layer-wise sparsity signals.
- Integrate pruning as part of the fitness function to create competition and attraction dynamics among parent models.
- Employ an annealing-like cyclic sparsification schedule to balance exploration and consolidation.
- Use an archive of diverse models to promote population diversity and robust merging.

実験結果
リサーチクエスチョン
- RQ1Does integrating sparsity into the merging objective improve reliability and modularity of merged LLMs compared to dense baselines?
- RQ2How do sparsity-driven competition and attraction influence the parameter-space exploration during model merging?
- RQ3What are the effects of archive size, sparsity schedules, and sparsity measurements on merging performance?
- RQ4Can SAE-generalize across tasks such as mathematical reasoning and multilingual understanding on large language models?
主な発見
| Method | Math + Multilingual | GSM8K | MMLU-ProX | Avg. |
|---|---|---|---|---|
| Task Arithmetic | 0.741 | 0.187 | 0.464 | - |
| Weight Average | 0.742 | 0.185 | 0.464 | - |
| Rankmean | 0.137 | 0.176 | 0.157 | - |
| PSO | 0.7801 | 0.164 | 0.472 | - |
| SAE (Global) | 0.798 | 0.170 | 0.484 | - |
| SAE (Local) | 0.7748 | 0.182 | 0.478 | - |
- SAE consistently outperforms PSO on GSM8K and MMLU-ProX across tasks and architectures (Global SAE: 0.798/0.170/0.484; Local SAE: 0.7748/0.182/0.478).
- Sparsity-aware scoring induces a dual competition–attraction dynamic that promotes sparser, modular solutions and reduces destructive interference.
- Increasing archive size improves SAE performance on MMLU-ProX, indicating archive diversity aids multilingual reasoning.
- Ablations show broader sparsity-rate ranges and zero-count sparsity measures can enhance performance, with task-dependent effects for layer-wise sparsity.
- Cyclic sparsity scheduling improves multilingual generalization and overall stability, with longer cycle expansion aiding exploration.

より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。