QUICK REVIEW

[论文解读] LLM-MLFFN: Multi-Level Autonomous Driving Behavior Feature Fusion via Large Language Model

Xiangyu Li, Tianyi Wang|arXiv (Cornell University)|Mar 3, 2026

Autonomous Vehicle Technology and Safety被引用 0

一句话总结

提出一种大型语言模型增强的多层特征融合网络（LLM-MLFFN），将数值驾驶特征与LLM派生的语义描述相结合，在Waymo数据上以高准确率对自动驾驶行为进行分类。

ABSTRACT

Accurate classification of autonomous vehicle (AV) driving behaviors is critical for safety validation, performance diagnosis, and traffic integration analysis. However, existing approaches primarily rely on numerical time-series modeling and often lack semantic abstraction, limiting interpretability and robustness in complex traffic environments. This paper presents LLM-MLFFN, a novel large language model (LLM)-enhanced multi-level feature fusion network designed to address the complexities of multi-dimensional driving data. The proposed LLM-MLFFN framework integrates priors from largescale pre-trained models and employs a multi-level approach to enhance classification accuracy. LLM-MLFFN comprises three core components: (1) a multi-level feature extraction module that extracts statistical, behavioral, and dynamic features to capture the quantitative aspects of driving behaviors; (2) a semantic description module that leverages LLMs to transform raw data into high-level semantic features; and (3) a dual-channel multi-level feature fusion network that combines numerical and semantic features using weighted attention mechanisms to improve robustness and prediction accuracy. Evaluation on the Waymo open trajectory dataset demonstrates the superior performance of the proposed LLM-MLFFN, achieving a classification accuracy of over 94%, surpassing existing machine learning models. Ablation studies further validate the critical contributions of multi-level fusion, feature extraction strategies, and LLM-derived semantic reasoning. These results suggest that integrating structured feature modeling with language-driven semantic abstraction provides a principled and interpretable pathway for robust autonomous driving behavior classification.

研究动机与目标

通过将语义解释与数值信号相结合，对自动驾驶车辆的行为进行超越短期轨迹的表征和分类。
开发一个融合多层数值特征和LLM生成的语义描述符的框架，以实现稳健的行为分类。
在Waymo轨迹数据上展示相较传统时间序列分类器的精度和可解释性的提升。

提出的方法

提取三个层级的数值特征：基本统计、驾驶行为指标和动态相关性。
使用LLM（GPT-4o）通过结构化提示将数值特征模式转化为自然语言语义描述。
通过一个双通道注意力融合网络将语义嵌入（通过RoBERTa）与数值特征融合，并用MLP进行分类。
端到端训练，损失使用交叉熵损失、dropout和L2正则化；采用80/10/10的训练/验证/测试划分，优化器为AdamW。
使用准确率、精确率、召回率和F1-score进行评估，并包含消融研究以评估多尺度卷积、时空注意力和语义特征的贡献。

实验结果

研究问题

RQ1将数值驾驶特征与LLM生成的语义特征相结合，能否提升驾驶行为分类的性能？
RQ2多层特征提取和双通道融合对预测性能和可解释性有何影响？
RQ3以LLM为基础的语义描述在复杂驾驶场景下对鲁棒性的影响如何？

主要发现

Model	Acc.	Pre.	Rec.	F1
LSTM	0.7166	0.8888	0.6227	0.8895
MLP	0.8321	0.8824	0.8584	0.8812
FCN	0.8075	0.7519	0.7915	0.6943
LSTM-FCN	0.8032	0.8909	0.8080	0.8934
GRU-FCN	0.6909	0.8877	0.5536	0.8893
mWDN	0.9005	0.8684	0.8595	0.8703
MLSTM-FCN	0.8182	0.8299	0.8003	0.8140
TST	0.7508	0.7701	0.7896	0.7347
GAF-ViT	0.9209	0.9219	0.8679	0.8850
LLM-MLFFN (Ours) Non-Feat.	0.9145	0.9430	0.9158	0.9464
LLM-MLFFN (Ours) Feat.	0.9145	0.9430	0.9135	0.9414

LLM-MLFFN在Waymo轨迹数据上相对于基线具有更高的准确性和精确/召回的平衡。
消融实验表明时空注意力和多尺度卷积对性能提升至关重要。
语义特征（LLM派生）与数值特征的融合优于仅使用任一模态。
当特征工程减少时，模型仍显示出强大性能，但在两种模态结合时收益最大。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。