QUICK REVIEW

[论文解读] Rethinking the Role of LLMs in Time Series Forecasting

Xin Qiu, Junlong Tong|arXiv (Cornell University)|Feb 16, 2026

Forecasting Techniques and Applications被引用 0

一句话总结

本文提供了一项大规模、跨数据集的研究，显示基于大语言模型的时间序列预测（LLM4TSF）在跨域泛化方面带来性能提升，并分析了在何时以及如何由预训练知识和模型架构贡献。

ABSTRACT

Large language models (LLMs) have been introduced to time series forecasting (TSF) to incorporate contextual knowledge beyond numerical signals. However, existing studies question whether LLMs provide genuine benefits, often reporting comparable performance without LLMs. We show that such conclusions stem from limited evaluation settings and do not hold at scale. We conduct a large-scale study of LLM-based TSF (LLM4TSF) across 8 billion observations, 17 forecasting scenarios, 4 horizons, multiple alignment strategies, and both in-domain and out-of-domain settings. Our results demonstrate that \emph{LLM4TS indeed improves forecasting performance}, with especially large gains in cross-domain generalization. Pre-alignment outperforming post-alignment in over 90\% of tasks. Both pretrained knowledge and model architecture of LLMs contribute and play complementary roles: pretraining is critical under distribution shifts, while architecture excels at modeling complex temporal dynamics. Moreover, under large-scale mixed distributions, a fully intact LLM becomes indispensable, as confirmed by token-level routing analysis and prompt-based improvements. Overall, Our findings overturn prior negative assessments, establish clear conditions under which LLMs are not only useful, and provide practical guidance for effective model design. We release our code at https://github.com/EIT-NLP/LLM4TSF.

研究动机与目标

评估预训练的大语言模型是否在时间序列预测上真正带来超越纯架构增强的收益。
在大规模设置下评估在域内与跨域场景中的预对齐（pre-alignment）与后对齐（post-alignment）策略。
将性能提升分解为来自预训练知识与模型架构的贡献。
研究数据多样性与路由决策如何影响LLM在TSF中的利用。
为设计有效的基于LLM的预测系统提供实际指南。

提出的方法

评估两种对齐范式：预对齐（将时间序列通过跨注意力与语言进行对齐，减少词嵌入维度）与后对齐（联合TS编码器与LLM微调）。
使用TS编码器、LLM主干（GPT-2）以及TS解码器在8B观测、62个数据集上对未来H个步长进行预测，H ∈ {96, 192, 336, 720}。
比较单数据集学习与跨数据集学习以分离数据多样性效应与模型效应。
进行消融研究：有预训练 vs 无预训练 vs 无LLM，以量化预训练知识与架构的影响。
进行令牌级路由分析以考察何时使用LLMs以及提示如何影响性能。
分析数据集属性（平移、转变、平稳性等），以理解何时LLMs有帮助，并使用合成数据以解耦因素。
与大规模TS基础模型及其他基于LLM的TSF方法在零样本与少样本设置下进行比较。
提供实际指南并讨论在TSF中部署LLMs的局限性。

实验结果

研究问题

RQ1在什么条件下LLMs在时间序列预测中带来收益，何时可能并不需要它们？
RQ2在大规模设置下，尤其在分布转移时，预对齐与后对齐策略的比较如何？
RQ3预训练知识与架构容量对TSF性能的不同贡献是什么？
RQ4数据多样性（跨数据集训练）如何影响在域内与跨域的泛化？
RQ5哪些机制（如令牌路由、提示）可以解释何时以及如何在预测中利用LLMs？

主要发现

通过跨数据集训练的LLM4TSF在预测性能上有所提升，在跨域泛化方面 gains 更大。
总体上预对齐在超过90%的任务中优于后对齐。
预测提升来自预训练知识与架构容量的互补性互动；在分布转移下预训练有帮助，而架构处理时间动态。
多源多样化的TS数据比单数据集基线在域内表现更强，且在跨域泛化方面表现更好。
在大规模场景下，完整的LLM变得至关重要，令牌路由显示LLM使用与数据属性如平移与转变相关。
信息性提示持续提升性能，表明语义引导不仅来自模型规模。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。