QUICK REVIEW

[논문 리뷰] LLM-MLFFN: Multi-Level Autonomous Driving Behavior Feature Fusion via Large Language Model

Xiangyu Li, Tianyi Wang|arXiv (Cornell University)|2026. 03. 03.

Autonomous Vehicle Technology and Safety인용 수 0

한 줄 요약

Proposes a large-language-model-enhanced multi-level feature fusion network (LLM-MLFFN) that combines numerical driving features with LLM-derived semantic descriptions to classify autonomous vehicle driving behaviors with high accuracy on Waymo data.

ABSTRACT

Accurate classification of autonomous vehicle (AV) driving behaviors is critical for safety validation, performance diagnosis, and traffic integration analysis. However, existing approaches primarily rely on numerical time-series modeling and often lack semantic abstraction, limiting interpretability and robustness in complex traffic environments. This paper presents LLM-MLFFN, a novel large language model (LLM)-enhanced multi-level feature fusion network designed to address the complexities of multi-dimensional driving data. The proposed LLM-MLFFN framework integrates priors from largescale pre-trained models and employs a multi-level approach to enhance classification accuracy. LLM-MLFFN comprises three core components: (1) a multi-level feature extraction module that extracts statistical, behavioral, and dynamic features to capture the quantitative aspects of driving behaviors; (2) a semantic description module that leverages LLMs to transform raw data into high-level semantic features; and (3) a dual-channel multi-level feature fusion network that combines numerical and semantic features using weighted attention mechanisms to improve robustness and prediction accuracy. Evaluation on the Waymo open trajectory dataset demonstrates the superior performance of the proposed LLM-MLFFN, achieving a classification accuracy of over 94%, surpassing existing machine learning models. Ablation studies further validate the critical contributions of multi-level fusion, feature extraction strategies, and LLM-derived semantic reasoning. These results suggest that integrating structured feature modeling with language-driven semantic abstraction provides a principled and interpretable pathway for robust autonomous driving behavior classification.

연구 동기 및 목표

Characterize and classify autonomous vehicle driving behaviors beyond short-term trajectories by integrating semantic interpretations with numerical signals.
Develop a framework that fuses multi-level numerical features and LLM-generated semantic descriptors for robust behavior classification.
Demonstrate improved accuracy and interpretability over traditional time-series classifiers using Waymo trajectory data.

제안 방법

Extract three levels of numerical features: basic statistics, driving behavior metrics, and dynamic correlations.
Use an LLM (GPT-4o) to convert numerical feature patterns into natural-language semantic descriptions via structured prompts.
Fuse semantic embeddings (via RoBERTa) with numerical features through a dual-channel attention-based fusion network and classify with an MLP.
Train end-to-end with cross-entropy loss, dropout, and L2 regularization; adopt an 80/10/10 train/validation/test split with AdamW optimization.
Evaluate using accuracy, precision, recall, and F1-score, including ablation studies to assess contributions from multi-scale convolutions, spatio-temporal attention, and semantic features.

실험 결과

연구 질문

RQ1Can combining numerical driving features with LLM-generated semantic features improve driving-behavior classification?
RQ2What is the impact of multi-level feature extraction and dual-channel fusion on predictive performance and interpretability?
RQ3How do semantic descriptions grounded in LLMs influence robustness across complex driving scenarios?

주요 결과

모델	정확도	정밀도	재현율	F1
LSTM	0.7166	0.8888	0.6227	0.8895
MLP	0.8321	0.8824	0.8584	0.8812
FCN	0.8075	0.7519	0.7915	0.6943
LSTM-FCN	0.8032	0.8909	0.8080	0.8934
GRU-FCN	0.6909	0.8877	0.5536	0.8893
mWDN	0.9005	0.8684	0.8595	0.8703
MLSTM-FCN	0.8182	0.8299	0.8003	0.8140
TST	0.7508	0.7701	0.7896	0.7347
GAF-ViT	0.9209	0.9219	0.8679	0.8850
LLM-MLFFN (Ours) Non-Feat.	0.9145	0.9430	0.9158	0.9464
LLM-MLFFN (Ours) Feat.	0.9145	0.9430	0.9135	0.9414

LLM-MLFFN achieves superior accuracy and precision/recall balance compared to baselines on Waymo trajectory data.
Ablation shows spatio-temporal attention and multi-scale convolution critically improve performance.
Fusion of semantic (LLM-derived) and numerical features outperforms using either modality alone.
The model demonstrates strong performance even when feature engineering is reduced, but benefits most from combined modalities.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.