QUICK REVIEW

[论文解读] FutureMind: Equipping Small Language Models with Strategic Thinking-Pattern Priors via Adaptive Knowledge Distillation

Shaoxiong Yang, Junting Li|arXiv (Cornell University)|Feb 1, 2026

Topic Modeling被引用 0

一句话总结

FutureMind 提出了一种训练自由的模块化推理框架，从大型模型中蒸馏出 strategic thinking-pattern priors 到 SLMs，使其具备自适应检索引导的多跳推理，在各种模型规模的训练自由方法中达到最先进水平。

ABSTRACT

Small Language Models (SLMs) are attractive for cost-sensitive and resource-limited settings due to their efficient, low-latency inference. However, they often struggle with complex, knowledge-intensive tasks that require structured reasoning and effective retrieval. To address these limitations, we propose FutureMind, a modular reasoning framework that equips SLMs with strategic thinking-pattern priors via adaptive knowledge distillation from large language models (LLMs). FutureMind introduces a dynamic reasoning pipeline composed of four key modules: Problem Analysis, Logical Reasoning, Strategy Planning, and Retrieval Guidance. This pipeline is augmented by three distinct retrieval paradigms that decompose complex queries into tractable subproblems, ensuring efficient and accurate retrieval execution. Extensive experiments on multi-hop QA benchmarks, including 2WikiMultihopQA, MuSiQue, Bamboogle, and Frames, demonstrate the superiority of FutureMind. It consistently outperforms strong baselines such as Search-o1, achieving state-of-the-art results under free training conditions across diverse SLM architectures and scales. Beyond empirical gains, our analysis reveals that the process of thinking-pattern distillation is restricted by the cognitive bias bottleneck between the teacher (LLMs) and student (SLMs) models. This provides new perspectives on the transferability of reasoning skills, paving the way for the development of SLMs that combine efficiency with genuine cognitive capability.

研究动机与目标

在小语言模型（SLMs）中激发对高效、知识密集型推理的需求并解决静态、单次检索的局限性。
提出 FutureMind，一种训练自由的模块化推理框架，将思维模式先验蒸馏到 SLMs。
设计一个四阶段推理流程（问题分析、逻辑推理、策略规划、检索引导）以及三种自适应检索范式。
在多跳问答基准上展示实证改进，并分析师生蒸馏中的认知偏差瓶颈。
提供关于轻量化模型可扩展推理的师生对齐洞见。

提出的方法

将 FutureMind 作为由 Thinking Module 协调的四阶段流程：问题分析、逻辑推理、策略规划、检索引导。
将查询分解为结构化组件（O、A、T、C），并通过逻辑推理得到机制性理解（M）以及关键条件（K）。
通过 Strategy Planning 在三种检索范式（Forward Stepwise Reasoning、Backward Constraint Focusing、Parallel Intersection Reasoning）之间动态选择，形成 R*。
生成规范化的检索引导（Γ），包含关键词、资源、序列、查询和筛选等指引以引导检索。
通过将来自 LLM 老师的自适应思维模式先验蒸馏到 SLM 学生中，在不进行梯度更新的情况下进行训练。
在四个多跳问答基准（2WikiMultihopQA、MuSiQue、Bamboogle、Frames）以及不同基础模型（SLMs 和 LLMs）上进行评估。

实验结果

研究问题

RQ1训练自由的模块化框架是否能够使小语言模型高效执行复杂的多跳推理？
RQ2对战略性思维模式先验的自适应知识蒸馏是否能在不同模型规模间实现鲁棒推理能力的迁移？
RQ3不同检索范式如何影响知识密集型任务的效率和准确性？
RQ4教师模型的规模与架构对蒸馏中师生认知对齐有何影响？
RQ5哪些模块化组件对多跳问答的性能提升贡献最大？

主要发现

Model	Method	2WikiMQA ACC E	2WikiMQA ACC L	Frames ACC E	Frames ACC L	Bamboogle ACC E	Bamboogle ACC L	MuSiQue ACC E	MuSiQue ACC L	Avg ACC E	Avg ACC L
Qwen-3B	Naive Gen	16.80	17.20	3.60	4.60	20.80	24.00	5.94	8.98	11.79	13.70
Qwen-3B	Standard RAG	24.00	24.40	10.20	13.00	26.40	38.40	12.01	19.17	18.15	23.74
Qwen-3B	Search-o1	41.00	41.80	10.40	12.60	34.40	39.20	11.77	18.81	24.39	28.10
Qwen-3B	TC+FM ∗	56.40	43.80	14.20	15.20	39.20	43.20	18.84	19.42	32.16	30.41
Qwen-7B	Naive Gen	29.40	25.20	7.60	10.80	34.40	52.80	11.29	16.87	20.67	22.62
Qwen-7B	Standard RAG	30.20	29.80	13.20	16.80	42.40	52.80	15.78	24.76	25.39	31.04
Qwen-7B	Search-o1	57.80	59.80	20.80	23.80	43.20	51.20	24.63	38.34	36.61	43.29
Qwen-7B	TC+FM ∗	62.00	64.00	20.00	23.80	58.40	64.80	25.12	34.71	20.00	23.80
Qwen-14B	Naive Gen	30.40	30.80	8.80	12.40	48.80	55.20	14.81	22.82	25.70	30.30
Qwen-14B	Standard RAG	27.40	28.40	14.00	18.60	44.80	56.00	17.96	28.40	26.04	32.85
Qwen-14B	Search-o1	66.80	68.40	20.60	25.60	43.20	55.20	30.46	46.48	40.27	48.92
Qwen-14B	TC+FM ∗	71.60	75.20	24.00	28.20	70.40	72.80	34.83	49.51	50.21	56.43
Qwen-32B	Naive Gen	30.80	31.30	10.80	15.20	54.40	60.80	15.66	24.51	27.91	32.95
Qwen-32B	Standard RAG	24.60	24.40	16.20	19.60	52.80	61.60	19.78	30.95	28.35	34.14
Qwen-32B	Search-o1	68.60	71.60	22.80	27.80	60.80	67.20	34.34	54.12	46.63	55.18
Qwen-32B	TC+FM ∗	74.40	77.80	26.00	30.40	68.80	72.80	37.15	53.86	51.59	58.71
Qwen-72B	Naive Gen	38.20	38.60	12.80	18.40	60.00	67.20	21.12	32.16	33.03	39.09
Qwen-72B	Standard RAG	31.00	31.40	16.20	19.60	59.20	67.20	25.97	37.62	33.79	40.01
Qwen-72B	Search-o1	72.60	75.40	24.60	30.80	67.20	72.80	37.37	56.67	50.44	58.92
Qwen-72B	TC+FM ∗	74.20	80.60	27.40	36.60	75.20	79.20	41.38	58.59	54.80	63.75
Llama3.1-8B	Naive Gen	38.20	38.60	12.80	18.40	60.00	67.20	21.12	32.16	33.03	39.09
Llama3.1-8B	Standard RAG	29.20	30.40	12.20	15.20	39.20	47.20	15.05	22.82	23.91	28.90
Llama3.1-8B	Search-o1	54.00	56.00	15.40	18.20	46.40	52.00	24.88	37.62	35.17	40.95
Llama3.1-8B	TC+FM ∗	55.20	56.80	21.80	25.20	58.40	64.00	27.43	39.92	40.71	46.48

FutureMind 与 TC+FM 在模型规模和架构上持续提升性能，在训练自由方法中在多跳问答基准上达到最先进水平。
自适应思维模式蒸馏为小模型带来显著收益，当使用高质量教师指导时，ACC E 与 ACC L 取得显著提升。
策略规划与检索引导的整合至关重要；移除模块或检索策略会降低性能，Forward Stepwise Reasoning 往往具有最大影响。
存在认知偏见瓶颈：教师计划过于复杂可能损害学生表现，强调师生兼容性高于单纯规模。
教师架构对迁移效果有决定性影响；中等规模、与架构对齐的教师（如 14B 相较于 32B）在学生表现的平均值上可能优于更大但不对齐的教师。
三种检索范式共同贡献性能提升；消融实验表明每种范式在不同任务结构下都具有价值。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。