QUICK REVIEW

[论文解读] A deep learning approach for detecting traffic accidents from social media data

Zhenhua Zhang, Qing Heb|arXiv (Cornell University)|Jan 4, 2018

Traffic Prediction and Management Techniques被引用 25

一句话总结

本文提出了一种深度学习框架，利用成对标记和两种模型——深度置信网络（DBN）与长短期记忆网络（LSTM）——从社交媒体数据中检测交通事故。基于纽约市和弗吉尼亚北部超过300万条推文，DBN在使用44个单独标记和17个成对标记特征的情况下实现了85%的准确率，优于支持向量机（SVM）和sLDA，且66%的事故相关推文与官方交通记录对齐。

ABSTRACT

This paper employs deep learning in detecting the traffic accident from social media data. First, we thoroughly investigate the 1-year over 3 million tweet contents in two metropolitan areas: Northern Virginia and New York City. Our results show that paired tokens can capture the association rules inherent in the accident-related tweets and further increase the accuracy of the traffic accident detection. Second, two deep learning methods: Deep Belief Network (DBN) and Long Short-Term Memory (LSTM) are investigated and implemented on the extracted token. Results show that DBN can obtain an overall accuracy of 85% with about 44 individual token features and 17 paired token features. The classification results from DBN outperform those of Support Vector Machines (SVMs) and supervised Latent Dirichlet allocation (sLDA). Finally, to validate this study, we compare the accident-related tweets with both the traffic accident log on freeways and traffic data on local roads from 15,000 loop detectors. It is found that nearly 66% of the accident-related tweets can be located by the accident log and more than 80% of them can be tied to nearby abnormal traffic data. Several important issues of using Twitter to detect traffic accidents have been brought up by the comparison including the location and time bias, as well as the characteristics of influential users and hashtags.

研究动机与目标

开发一种可扩展、数据驱动的方法，利用实时社交媒体内容检测交通事故。
探究成对标记在捕捉事故相关推文语义和上下文关联方面的有效性。
比较深度学习模型（DBN、LSTM）与传统方法（SVM、sLDA）在分类社交媒体交通事故报告中的表现。
通过与官方交通事故记录和环形检测器数据的空间与时间对齐，验证模型预测在现实世界中的准确性。
识别影响社交媒体事故检测的关键偏差及用户行为模式，如位置、时间及关键用户的影响。

提出的方法

在为期一年的时间内，收集并处理了来自纽约市和弗吉尼亚北部的超过300万条推文。
从推文中提取单独和成对的标记，以建模与交通事故相关的语义关系和上下文关联。
使用提取的标记特征对深度置信网络（DBN）进行训练，以将推文分类为与事故相关或无关。
使用相同的特征集，将DBN的性能与支持向量机（SVM）和监督型潜在狄利克雷分布（sLDA）进行比较。
应用长短期记忆网络（LSTM）建模推文文本中的序列模式，以实现时间上下文感知。
通过与官方交通事故记录及15,000个环形检测器的数据在空间和时间上的对齐，验证模型预测结果。

实验结果

研究问题

RQ1与单独标记相比，成对标记是否能提高社交媒体数据中交通事故检测的准确率？
RQ2DBN和LSTM等深度学习模型在分类事故相关推文方面，与传统机器学习模型（SVM、sLDA）相比表现如何？
RQ3社交媒体中的事故相关推文在多大程度上与官方交通事故记录和实时交通数据对齐？
RQ4社交媒体事故检测中存在哪些主要偏差，特别是关于位置、时间及用户影响力方面？
RQ5话题标签和关键用户如何影响社交媒体上事故报告的可见性和可靠性？

主要发现

DBN模型在使用44个单独标记特征和17个成对标记特征的情况下，实现了85%的整体分类准确率。
DBN在从社交媒体数据中检测交通事故方面优于SVM和sLDA，表现出更优的特征表示能力。
约66%的模型识别出的事故相关推文成功与官方交通事故记录匹配。
超过80%的事故相关推文与附近环形检测器检测到的异常交通模式相关联。
研究发现社交媒体报告中存在显著的位置和时间偏差，事故相关内容在高峰时段和城市中心区域更为集中。
关键用户和热门话题标签在放大事故报告方面发挥了显著作用，但同时也引入了噪声和潜在的过度代表现象。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。