QUICK REVIEW

[论文解读] PILOT: Planning via Internalized Latent Optimization Trajectories for Large Language Models

Haoyu Zheng, Yun Zhu|arXiv (Cornell University)|Jan 7, 2026

Multimodal Machine Learning Applications被引用 0

一句话总结

PILOT 通过一个查询条件化的潜在锚点，借助一个轻量级超网络来内在化规划，从而在紧凑型大模型中实现长期推理的稳定性，且延迟极低。在数学与编码基准上超越强基线。

ABSTRACT

Strategic planning is critical for multi-step reasoning, yet compact Large Language Models (LLMs) often lack the capacity to formulate global strategies, leading to error propagation in long-horizon tasks. Our analysis reveals that LLMs possess latent reasoning capabilities that can be unlocked when conditioned on explicit plans from a teacher model; however, runtime reliance on external guidance is often impractical due to latency and availability constraints. To bridge this gap, we propose PILOT (Planning via Internalized Latent Optimization Trajectories), a non-invasive framework designed to internalize the strategic oversight of large models into intrinsic Latent Guidance. Instead of altering backbone weights, PILOT employs a lightweight Hyper-Network to synthesize a query-conditioned Latent Guidance vector. This vector acts as an internal steering mechanism, guiding the model's representations toward optimal reasoning paths. Extensive experiments on mathematical and coding benchmarks demonstrate that PILOT effectively stabilizes reasoning trajectories, consistently outperforming strong baselines (e.g., +8.9% on MATH500) with negligible inference latency.

研究动机与目标

强调在紧凑型大模型中全球性策略规划的必要性，以防止多步任务中的错误传播。
提出一个不会修改主干权重的内部潜在引导机制。
开发基于超网络的锚点适配器以生成每实例的规划信号。
通过能量对齐的注入实现对主干的非侵入式整合，降低推理开销。

提出的方法

引入潜在锚定生成，其中锚点 z 在枢点层 l† 条件化解码。
使用构建-并验证管线从专家轨迹中提取同质化目标状态 z*。
设计带有双通道上下文聚合的锚点适配器 psiθ，以合成一个查询条件化的锚点。
使用从 z* 的全局质心Warm-start 的原锚点 P，并由超网络 Hθ 调制，产生类似 FiLM 的参数。
通过延迟可视化机制和能量对齐注入注入锚点，以保持主干稳定性。
采用两阶段课程学习进行优化：潜在对齐损失以匹配 z*，以及带门控正则化的锚定微调，以避免嵌入冲击。

实验结果

研究问题

RQ1内部潜在引导是否能在不改变主干权重的情况下稳定紧凑型大模型的单路径推理？
RQ2查询条件化潜在锚点是否提升在数学与代码生成等长跨度任务上的性能？
RQ3枢点层深度如何影响在数学 vs. 代码等领域的有效性？
RQ4与其他潜在干预方法相比，PILOT 的延迟/效率权衡如何？

主要发现

PILOT 在不同模型规模下在数学与编码基准上始终优于强基线。
在 MATH500 上，PILOT 在某些设定下实现了最高可达 8.9 个百分点的提升，解码延迟几乎不增加。
消融实验表明超网络对数学任务至关重要，而能量对齐对代码生成稳定性防止不稳定至关重要。
注入深度与路径动力学因任务而异，抽象数学任务更受益于更深的枢点，而代码结构任务则更早的枢点更有利。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。