QUICK REVIEW

[論文レビュー] The Magic Correlations: Understanding Knowledge Transfer from Pretraining to Supervised Fine-Tuning

Simin Fan, Dimitris Paparas|arXiv (Cornell University)|Feb 11, 2026

Topic Modeling被引用数 0

ひとこと要約

tldr: The paper analyzes how capabilities learned during pretraining transfer to supervised fine-tuning, using correlation protocols across data mixtures, model scales, and benchmarks to reveal when transfer is reliable and how calibration evolves.

ABSTRACT

Understanding how language model capabilities transfer from pretraining to supervised fine-tuning (SFT) is fundamental to efficient model development and data curation. In this work, we investigate four core questions: RQ1. To what extent do accuracy and confidence rankings established during pretraining persist after SFT? RQ2. Which benchmarks serve as robust cross-stage predictors and which are unreliable? RQ3. How do transfer dynamics shift with model scale? RQ4. How well does model confidence align with accuracy, as a measure of calibration quality? Does this alignment pattern transfer across training stages? We address these questions through a suite of correlation protocols applied to accuracy and confidence metrics across diverse data mixtures and model scales. Our experiments reveal that transfer reliability varies dramatically across capability categories, benchmarks, and scales -- with accuracy and confidence exhibiting distinct, sometimes opposing, scaling dynamics. These findings shed light on the complex interplay between pretraining decisions and downstream outcomes, providing actionable guidance for benchmark selection, data curation, and efficient model development.

研究の動機と目的

Assess how accuracy and confidence rankings from pretraining persist after supervised fine-tuning (SFT).
Identify benchmarks that reliably predict post-SFT performance and those that don’t.
Characterize how transfer dynamics shift with model scale across diverse data mixtures.
Examine how model confidence aligns with accuracy (calibration) and whether this alignment persists across training stages.

提案手法

Train decoder-only transformer models at two scales (240M and 1B parameters).
Create 9 diverse pretraining data mixtures by crossing web, code, and curated sources with varying proportions.
Fine-tune pretrained checkpoints on a single SFT dataset (Tulu-v2-mix) for 5 epochs.
Evaluate on 20 benchmarks across four capability categories (Commonsense, Science, NLI, Semantic).
Compute cross-stage correlations for accuracy and confidence across mixtures to assess transfer reliability.
Analyze intra-category coherence, cross-stage calibration, and the effect of model scale on transfer patterns.

Figure 1 : Cross-stage correlation by capability category. (a) Accuracy correlation : the 1B model generally shows higher transferability; (b) Confidence correlation : 240M maintains substantially higher correlation especially in Commonsense ( 0.87 vs. 0.40 ) and Science ( 0.82 vs. 0.49 ) domains. T

実験結果

リサーチクエスチョン

RQ1To what extent do accuracy and confidence rankings from pretraining persist after SFT?
RQ2Which benchmarks serve as reliable early-stage predictors across stages and which do not?
RQ3How do transfer dynamics change with model scale?
RQ4How well does model confidence align with accuracy, and does this calibration pattern persist across training stages?

主な発見

Accuracy transfer increases with model scale (1B typically shows higher cross-stage accuracy correlation than 240M).
Confidence transfer is stronger at smaller scales (240M) and weaker at larger scales (1B), with distinct category-dependent patterns.
Commonsense and Science benchmarks show high cross-stage accuracy correlation, while NLI and Semantic benchmarks show weaker transfer.
Confidence patterns persist strongly for Commonsense and Science at 240M (e.g., mean cross-stage confidence correlation around 0.87 and 0.82, respectively).
Intra-category coherence shifts with scale: smaller models exhibit competition within a category, larger models show synergy, especially in Science.
Science tasks exhibit high alignment between confidence and accuracy (r_align ~ 0.8), while Commonsense and Semantic tasks show miscalibration that persists through SFT.
Educational-filtering data (FineWeb-Edu) yields scale-dependent accuracy and calibration effects, improving some tasks at 240M but sometimes degrading them at 1B.

Figure 2 : Cross-stage Correlation across various benchmarks. Each bar shows the Pearson correlation between PT and SFT performance on the certain benchmark across data mixtures. (a) Accuracy Correlation : the 1B model achieves higher transferrability than 240M (in average $\bar{r}$ = $\small 0.59$

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。