QUICK REVIEW

[論文レビュー] Improving Training Efficiency and Reducing Maintenance Costs via Language Specific Model Merging

Alphaeus Dmonte, Vidhi Gupta|arXiv (Cornell University)|Jan 22, 2026

Topic Modeling被引用数 0

ひとこと要約

tldr: The paper analyzes language-specific model merging as an efficient alternative to retraining multilingual LLMs, showing similar quality with substantial reductions in training time and maintenance cost across multiple tasks and datasets.

ABSTRACT

Fine-tuning a task-specific multilingual large language model (LLM) involves training the model on a multilingual dataset with examples in all the required languages. Updating one or more supported languages with additional data or adding support for a new language involves retraining the model, which can be computationally inefficient and creates a severe maintenance bottleneck. Recent research on merging multilingual multitask models has shown promise in terms of improved quality, but its computational and maintenance efficiency remains unstudied. In this work, we provide the first focused analysis of this merging strategy from an efficiency perspective, evaluating it across three independent tasks. We demonstrate significant efficiency gains while maintaining parity in terms of quality: this merging approach reduces the initial training time by up to 50\%. We also demonstrate that updating an individual language and re-merging as part of model maintenance reduces training costs by more than 60\%, compared to re-training the full multilingual model. We show this on both public and proprietary industry datasets confirming that the approach works well for industrial use cases in addition to academic settings already studied in previous work.

研究の動機と目的

Objective-1: Motivate the high cost and maintenance bottlenecks of fine-tuning multilingual LLMs in enterprise settings.
Objective-2: Propose and evaluate language-specific model merging as a more efficient alternative to retraining on a combined multilingual dataset.
Objective-3: Quantify training time and cost savings across multiple tasks and languages.
Objective-4: Assess robustness across public and proprietary datasets to validate industrial applicability.

提案手法

Method-1: Employ three merging techniques (TIES, DARE, KnOTS) to create language-specific adapters and merge them into a single multilingual model.
Method-2: Fine-tune base Llama-3.1-8b-Instruct with LoRA on five languages for three tasks (Summarization, Commonsense Reasoning, Sentiment) and compare against COMB and INDV baselines.
Method-3: Experiment with hyperparameters (weighting, density) to generate eight merged models per task.
Method-4: Evaluate with task-specific metrics (ROUGE-1, ROUGE-L, BertScore for summarization; accuracy for reasoning; macro F1, precision, recall for sentiment).
Method-5: Compare training time and cost between the traditional retrain-all approach and the train-once, merge-as-needed approach, including maintenance scenarios where only language adapters are updated.

Figure 1: Traditional “retrain-all” training approach vs. Language Specific “train-once, merge-as-needed” approach.

実験結果

リサーチクエスチョン

RQ1研究質問1: language-specific model merging はタスク性能の点で retrain-all multilingual baseline に匹敵するか、それを上回るか。
RQ2研究質問2: language-specific merging を使用した場合の相対的な学習時間とコストの削減は、統合された multilingual データセットでの再学習と比較してどの程度か。
RQ3研究質問3: merging 技術は、タスク（要約、推論、感情）と言語（EN, DE, FR, JA, ZH）でどのように性能を発揮するか。
RQ4研究質問4: 一つの言語アダプターを更新することが、全体の統合モデルの性能と保守性にどのような影響を与えるか。
RQ5研究質問5: 結果は小規模モデルや独自データセットにも一般化するか。

主な発見

Phase	Model	Training Time	Training Cost
Initial Setup	Combined Model	3.4h	$113.4
Initial Setup	Merged Model	2.2h (35.3% down)	$107.1 (5.6% down)
Update/Add Language	Combined Model	3.8h	$119.7
Update/Add Language	Merged Model	1.0h (73.7% down)	$31.5 (73.7% down)
Case Study Initial Setup	Combined Model (Case Study)	45h	$1416
Case Study Initial Setup	Merged Model (Case Study)	22.5h (50% down)	$1400 (1.1% down)
Case Study Update/Add Language	Combined Model (Case Study)	54.5h	$1717
Case Study Update/Add Language	Merged Model (Case Study)	20.5h (62.4% down)	$645 (62.4% down)

主な所見1: 結合学習ベースと同等の性能を、複数タスクで達成する一方で、いくつかの言語では要約と推論で改善を示す。
主な所見2: 初期セットアップ時の学習時間を最大35%削減、個別言語を更新して再統合する保守時間の削減は70%以上となり、全 multilingual モデルの再学習と比較して効果的。
主な所見3: 要約では、いくつかの merged configurations（例：TIES-KnOTS、DARE-TIES-KnOTS）が英語、日本語、中国語でベースラインを上回り、BertScore の伸びは0.1～0.6ポイント。
主な所見4: 推論では、統合モデルは通常ベースラインと同等、正答率で最大約2.2ポイントの改善が見られる場合も。ドイツ語とフランス語は時にベースラインを好む。
主な所見5: 感情分析では、結合モデルがしばしば最良である一方、いくつかの merged 設定が特定言語で個別言語ベースラインを上回る。
主な所見6: アブレーション研究は、単一言語アダプター（例：EN）を更新することで全体の統合性能が向上し、他言語への利益伝播が起き得ることを示す。モデルサイズの実験では、8b および 3b の LLM で統合が実現可能であり、サイズによって性能変動がある。

Figure 2: The aggregated hallucination rate across the languages (lower is better). The plot shows the scores for four models, two baselines, and the best performing merged model TIES. The scores for the model merged with updated Japanese data are also reported. The ’mix’ language refers to having m

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。