QUICK REVIEW

[논문 리뷰] Enhancing Traffic Incident Management with Large Language Models: A Hybrid Machine Learning Approach for Severity Classification

Artur Grigorev, Khaled Saleh|arXiv (Cornell University)|2024. 03. 20.

Traffic Prediction and Management Techniques인용 수 6

한 줄 요약

이 논문은 트래픽 사건 심각도 분류를 위해 대형 언어 모델(LLM) 기능과 전통 ML 모델을 통합하는 것을 조사하고, 세 국제 데이터셋에서 여러 LLM과 ML 알고리즘을 비교한다. Random Forest와 XGBoost는 LLM 특징을 사용할 때 전통적인 특징 엔지니어링과 종종 일치하거나 향상시킨다.

ABSTRACT

This research showcases the innovative integration of Large Language Models into machine learning workflows for traffic incident management, focusing on the classification of incident severity using accident reports. By leveraging features generated by modern language models alongside conventional data extracted from incident reports, our research demonstrates improvements in the accuracy of severity classification across several machine learning algorithms. Our contributions are threefold. First, we present an extensive comparison of various machine learning models paired with multiple large language models for feature extraction, aiming to identify the optimal combinations for accurate incident severity classification. Second, we contrast traditional feature engineering pipelines with those enhanced by language models, showcasing the superiority of language-based feature engineering in processing unstructured text. Third, our study illustrates how merging baseline features from accident reports with language-based features can improve the severity classification accuracy. This comprehensive approach not only advances the field of incident management but also highlights the cross-domain application potential of our methodology, particularly in contexts requiring the prediction of event outcomes from unstructured textual data or features translated into textual representation. Specifically, our novel methodology was applied to three distinct datasets originating from the United States, the United Kingdom, and Queensland, Australia. This cross-continental application underlines the robustness of our approach, suggesting its potential for widespread adoption in improving incident management processes globally.

연구 동기 및 목표

LLM에서 추출된 전체 텍스트 보고서 특징이 전통적인 특징 엔지니어링보다 심각도 분류를 향상시키는지 평가한다.
incident severity 예측을 위한 LLM과 머신러닝 모델의 조합을 평가한다.
Baseline 특징과 NLP에서 얻은 특징의 결합이 예측 정확도를 높이는지 확인한다.

제안 방법

사고 보고서를 열 이름과 값의 조합으로 전체 텍스트 표현으로 변환한다.
다양한 LLM(BERT 계열, XLNet, RoBERTa, ALBERT 등)을 사용하여 전체 텍스트 설명에서 숫자 특징을 추출한다.
Baseline, NLP, 및 결합 특징 세트에 대해 XGBoost, LightGBM, Random Forest, KNN의 ML 모델을 학습하고 비교한다.
데이터셋을 균등 샘플링으로 균형화하고 0 분산 특징을 제거한다.
F1-score, 정확도, 정밀도, 재현율 등의 지표를 사용한 교차 검증으로 평가한다.

실험 결과

연구 질문

RQ1LLM에서 추출된 특징이 전통적인 특징 엔지니어링에 비해 사건 심각도 분류 성능을 향상시키는가?
RQ2가장 우수한 심각도 예측을 제공하는 LLM과 ML 모델의 조합은 무엇인가?
RQ3Baseline 특징과 NLP 특징의 결합이 어느 정도로 둘 중 하나의 특징만 사용하는 것보다 성능을 향상시키는가?

주요 결과

LLM 특징은 트리 기반 모델과 함께 사용될 때 전통적인 특징과 결합될 경우 성능을 향상시키거나 일치시키는 경향이 있다.
이 작업에서 서로 다른 언어 모델 간에 뚜렷한 차이가 나타나지 않아, 사용된 사고·사건 내러티브에서 구별 가능한 정보가 제한적임을 시사한다.
결합 특징(보고서 + NLP)을 사용한 퀸즐랜드 데이터에서의 최고 F1-score은 0.65로 RandomForest에 GPT-2 특징을 사용할 때이다.
XGBoost와 BERT 특징은 퀸즐랜드 데이터에서 경쟁력 있는 F1-score 0.56을 달성했다.
전반적으로 RLLM 특징과 baseline 특징의 결합은 두 가지 중 하나만 사용하는 것보다 심각도 분류에서 일반적으로 더 나은 성능을 나타낸다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.