QUICK REVIEW

[论文解读] Spatial-Temporal Large Language Model for Traffic Prediction

Chenxi Liu, Sun Yang|arXiv (Cornell University)|Jan 18, 2024

Traffic Prediction and Management Techniques被引用 10

一句话总结

ST-LLM 将位置时间步定义为标记，应用时空嵌入，并使用部分冻结注意力的语言模型来预测交通，在常规、少样本和零样本情景中取得强劲表现。

ABSTRACT

Traffic prediction, an essential component for intelligent transportation systems, endeavours to use historical data to foresee future traffic features at specific locations. Although existing traffic prediction models often emphasize developing complex neural network structures, their accuracy has not improved. Recently, large language models have shown outstanding capabilities in time series analysis. Differing from existing models, LLMs progress mainly through parameter expansion and extensive pretraining while maintaining their fundamental structures. Motivated by these developments, we propose a Spatial-Temporal Large Language Model (ST-LLM) for traffic prediction. In the ST-LLM, we define timesteps at each location as tokens and design a spatial-temporal embedding to learn the spatial location and global temporal patterns of these tokens. Additionally, we integrate these embeddings by a fusion convolution to each token for a unified spatial-temporal representation. Furthermore, we innovate a partially frozen attention strategy to adapt the LLM to capture global spatial-temporal dependencies for traffic prediction. Comprehensive experiments on real traffic datasets offer evidence that ST-LLM is a powerful spatial-temporal learner that outperforms state-of-the-art models. Notably, the ST-LLM also exhibits robust performance in both few-shot and zero-shot prediction scenarios. The code is publicly available at https://github.com/ChenxiLiu-HNU/ST-LLM.

研究动机与目标

通过利用大型语言模型来捕捉全局时空依赖性，提升交通预测。
引入时空嵌入与标记化方案，将每个位置的时间步重新表述为标记。
开发部分冻结注意力（PFA）策略，使 LLMs 能在保留预训练知识的同时适应交通数据。
证明 ST-LLM 在准确性方面优于状态-of-the-art 模型，并在少样本和零样本情景中具有鲁棒性。

提出的方法

将交通数据定义为张量 X ∈ R^{T x N x C}。
通过时空嵌入层对 P 个历史时间步进行标记嵌入编码（基于 PConv 的标记嵌入、小时/日/周位置编码、自适应空间嵌入）。
用融合卷积将嵌入融合成 E_F ∈ R^{N x 3D}。
用部分冻结注意力的 LLM 处理嵌入，其中前 F 层被冻结，后 U 层多头注意力解冻，输出 H^L ∈ R^{N x 3D}。
使用回归卷积预测下一个 S 个时间步：Ŷ_S = RConv(H^{F+U})。
训练时损失为 L = ||Ŷ_S - Y_S|| + λ L_reg。

实验结果

研究问题

RQ1ST-LLM 是否能通过将位置-时间视为标记，有效建模交通数据中的时空依赖？
RQ2相较于完全冻结或完全可调的设置，部分冻结注意力是否能提升 LLM 对交通预测的适应性？
RQ3ST-LLM 在不同交通数据集上的少样本和零样本迁移表现如何？
RQ4时空嵌入及其融合对预测准确性的影响是什么？
RQ5ST-LLM 与现实世界数据集上现有的 GNN/基于注意力的模型相比如何？

主要发现

ST-LLM 在 NYC taxi 与 CHBike 数据集的多种交通预测情景中超越了最先进的模型。
ST-LLM 在 MAE/MAPE/RMSE/WAPE 等指标上均低于基线模型，包括 DCRNN、STGCN、ASTGCN、GMAN、GATGPT、GCNGPT、以及 LLAMA2。
部分冻结注意力（PFA）在性能上优于冻结、完全可调以及其他基线。
ST-LLM 在少样本和零样本预测中表现出鲁棒性，具有强大的跨域迁移能力。
消融研究表明去掉 LLM 或时空嵌入会显著降低性能，凸显它们的重要性。
推理时的分析表明，与若干 LLM 基线相比，ST-LLM 在速度与精度之间取得有利平衡。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。