QUICK REVIEW

[論文レビュー] FutureMind: Equipping Small Language Models with Strategic Thinking-Pattern Priors via Adaptive Knowledge Distillation

Shaoxiong Yang, Junting Li|arXiv (Cornell University)|Feb 1, 2026

Topic Modeling被引用数 0

ひとこと要約

FutureMindはトレーニングを伴わないモジュラー推論フレームワークで、LLMからSLMへと戦略的思考パターンの priors を蒸留し、適応的な検索-guided マルチホップ推論を実現し、さまざまなモデルサイズでトレーニング不要手法の最先端成果を達成する。

ABSTRACT

Small Language Models (SLMs) are attractive for cost-sensitive and resource-limited settings due to their efficient, low-latency inference. However, they often struggle with complex, knowledge-intensive tasks that require structured reasoning and effective retrieval. To address these limitations, we propose FutureMind, a modular reasoning framework that equips SLMs with strategic thinking-pattern priors via adaptive knowledge distillation from large language models (LLMs). FutureMind introduces a dynamic reasoning pipeline composed of four key modules: Problem Analysis, Logical Reasoning, Strategy Planning, and Retrieval Guidance. This pipeline is augmented by three distinct retrieval paradigms that decompose complex queries into tractable subproblems, ensuring efficient and accurate retrieval execution. Extensive experiments on multi-hop QA benchmarks, including 2WikiMultihopQA, MuSiQue, Bamboogle, and Frames, demonstrate the superiority of FutureMind. It consistently outperforms strong baselines such as Search-o1, achieving state-of-the-art results under free training conditions across diverse SLM architectures and scales. Beyond empirical gains, our analysis reveals that the process of thinking-pattern distillation is restricted by the cognitive bias bottleneck between the teacher (LLMs) and student (SLMs) models. This provides new perspectives on the transferability of reasoning skills, paving the way for the development of SLMs that combine efficiency with genuine cognitive capability.

研究の動機と目的

SLMs における知識集約型推論の必要性を動機づけ、静的で単発の検索の限界に対処する。
FutureMind を提案、思考パターン priors を蒸留するトレーニング不要のモジュラー推論フレームワーク。
四段階推論パイプライン（Problem Analysis、Logical Reasoning、Strategy Planning、Retrieval Guidance）と三つの適応的検索パラダイムを設計。
マルチホップQAベンチマークで実証的な改善を示し、教師-学生蒸留における認知バイアスのボトルネックを分析。
軽量モデルにおけるスケーラブルな推論のための教師-学生整合性に関する洞察を提供。

提案手法

FutureMindをThinking Moduleが統括する四段階パイプラインとして導入（Problem Analysis、Logical Reasoning、Strategy Planning、Retrieval Guidance）。
クエリを構造的要素（O、A、T、C）へ分解し、Logical Reasoningを通じて機械的な理解（M）と重要条件（K）を導出。
Strategy Planning によって三つの検索パラダイム（Forward Stepwise Reasoning、Backward Constraint Focusing、Parallel Intersection Reasoning）を動的に選択し、R*を形成。
Keyword、Resource、Sequence、Query、Screening guidance を含む規範的 Retrieval Guidance（Γ）を生成し、検索を導く。
SLM 学習において、LLM 教師から適応的な thinking-pattern priors を蒸留することで勾配更新なしにトレーニング。
四つのマルチホップQAベンチマーク（2WikiMultihopQA、MuSiQue、Bamboogle、Frames）と varied base models（SLMs と LLMs）を横断して評価。

実験結果

リサーチクエスチョン

RQ1トレーニング不要のモジュラーフレームワークは、小型言語モデルにとって複雑なマルチホップ推論を効率的に実現できるか。
RQ2 strategic thinking-pattern priors の適応知識蒸留は、モデル規模を超えて堅牢な推論能力を transfer できるか。
RQ3異なる検索パラダイムは、知識集約型タスクにおける効率と精度にどう影響するか。
RQ4蒸留における教師モデルの規模とアーキテクチャが、教師-学生の認知的整合性にどのように影響するか。
RQ5マルチホップQAで性能向上に寄与するモジュール要素はどれか。

主な発見

Model	Method	2WikiMQA ACC E	2WikiMQA ACC L	Frames ACC E	Frames ACC L	Bamboogle ACC E	Bamboogle ACC L	MuSiQue ACC E	MuSiQue ACC L	Avg ACC E	Avg ACC L
Qwen-3B	Naive Gen	16.80	17.20	3.60	4.60	20.80	24.00	5.94	8.98	11.79	13.70
Qwen-3B	Standard RAG	24.00	24.40	10.20	13.00	26.40	38.40	12.01	19.17	18.15	23.74
Qwen-3B	Search-o1	41.00	41.80	10.40	12.60	34.40	39.20	11.77	18.81	24.39	28.10
Qwen-3B	TC+FM ∗	56.40	43.80	14.20	15.20	39.20	43.20	18.84	19.42	32.16	30.41
Qwen-7B	Naive Gen	29.40	25.20	7.60	10.80	34.40	52.80	11.29	16.87	20.67	22.62
Qwen-7B	Standard RAG	30.20	29.80	13.20	16.80	42.40	52.80	15.78	24.76	25.39	31.04
Qwen-7B	Search-o1	57.80	59.80	20.80	23.80	43.20	51.20	24.63	38.34	36.61	43.29
Qwen-7B	TC+FM ∗	62.00	64.00	20.00	23.80	58.40	64.80	25.12	34.71	20.00	23.80
Qwen-14B	Naive Gen	30.40	30.80	8.80	12.40	48.80	55.20	14.81	22.82	25.70	30.30
Qwen-14B	Standard RAG	27.40	28.40	14.00	18.60	44.80	56.00	17.96	28.40	26.04	32.85
Qwen-14B	Search-o1	66.80	68.40	20.60	25.60	43.20	55.20	30.46	46.48	40.27	48.92
Qwen-14B	TC+FM ∗	71.60	75.20	24.00	28.20	70.40	72.80	34.83	49.51	50.21	56.43
Qwen-32B	Naive Gen	30.80	31.30	10.80	15.20	54.40	60.80	15.66	24.51	27.91	32.95
Qwen-32B	Standard RAG	24.60	24.40	16.20	19.60	52.80	61.60	19.78	30.95	28.35	34.14
Qwen-32B	Search-o1	68.60	71.60	22.80	27.80	60.80	67.20	34.34	54.12	46.63	55.18
Qwen-32B	TC+FM ∗	74.40	77.80	26.00	30.40	68.80	72.80	37.15	53.86	51.59	58.71
Qwen-72B	Naive Gen	38.20	38.60	12.80	18.40	60.00	67.20	21.12	32.16	33.03	39.09
Qwen-72B	Standard RAG	31.00	31.40	16.20	19.60	59.20	67.20	25.97	37.62	33.79	40.01
Qwen-72B	Search-o1	72.60	75.40	24.60	30.80	67.20	72.80	37.37	56.67	50.44	58.92
Qwen-72B	TC+FM ∗	74.20	80.60	27.40	36.60	75.20	79.20	41.38	58.59	54.80	63.75
Llama3.1-8B	Naive Gen	38.20	38.60	12.80	18.40	60.00	67.20	21.12	32.16	33.03	39.09
Llama3.1-8B	Standard RAG	29.20	30.40	12.20	15.20	39.20	47.20	15.05	22.82	23.91	28.90
Llama3.1-8B	Search-o1	54.00	56.00	15.40	18.20	46.40	52.00	24.88	37.62	35.17	40.95
Llama3.1-8B	TC+FM ∗	55.20	56.80	21.80	25.20	58.40	64.00	27.43	39.92	40.71	46.48

FutureMind with TC+FM はモデル規模とアーキテクチャを問わず一貫して性能を改善し、トレーニング不要手法のマルチホップQAベンチマークで最先端の結果を達成。
適応的思考パターン蒸留は小型モデルに大きな利得をもたらし、教師ガイダンスの品質が高い場合には ACC E および ACC L で顕著な改善を示す。
Strategy Planning と retrieval-guidance の統合は重要であり、モジュールや検索戦略を削除すると性能が低下する。Forward Stepwise Reasoning が最も影響を与えることが多い。
認知バイアスのボトルネックが存在：過度に複雑な教師プランは学生の性能を害する可能性がある。教師-学生の適合性が単なるスケールより重要であることを強調。
教師アーキテクチャは転移効果に決定的な影響を及ぼす；中規模でアーキテクチャ的に整合した教師（例：14B）が、より大きくミスアラインされた教師よりも学生成績の平均を高め得る。
三つの検索パラダイムはすべて性能向上に寄与；アブレーションはタスク構造に応じて各パラダイムの価値を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。