QUICK REVIEW

[论文解读] Intelligent Intersection: Two-Stream Convolutional Networks for Real-time Near Accident Detection in Traffic Video

Xiaohui Huang, Pan He|arXiv (Cornell University)|Jan 4, 2019

Video Surveillance and Tracking Methods参考文献 49被引用 25

一句话总结

本文提出一种双流3D卷积神经网络框架，通过空间流与时间流网络联合实现实时车辆检测、多目标跟踪及近事故检测，应用于航拍交通视频。该方法在新构建的交通近事故数据集（TNAD）上达到89.4%的精确率、83.3%的召回率和86.3%的F1分数，表现出高帧率性能（20–30 fps）。

ABSTRACT

In Intelligent Transportation System, real-time systems that monitor and analyze road users become increasingly critical as we march toward the smart city era. Vision-based frameworks for Object Detection, Multiple Object Tracking, and Traffic Near Accident Detection are important applications of Intelligent Transportation System, particularly in video surveillance and etc. Although deep neural networks have recently achieved great success in many computer vision tasks, a uniformed framework for all the three tasks is still challenging where the challenges multiply from demand for real-time performance, complex urban setting, highly dynamic traffic event, and many traffic movements. In this paper, we propose a two-stream Convolutional Network architecture that performs real-time detection, tracking, and near accident detection of road users in traffic video data. The two-stream model consists of a spatial stream network for Object Detection and a temporal stream network to leverage motion features for Multiple Object Tracking. We detect near accidents by incorporating appearance features and motion features from two-stream networks. Using aerial videos, we propose a Traffic Near Accident Dataset (TNAD) covering various types of traffic interactions that is suitable for vision-based traffic analysis tasks. Our experiments demonstrate the advantage of our framework with an overall competitive qualitative and quantitative performance at high frame rates on the TNAD dataset.

研究动机与目标

解决复杂城市交通场景中实时、统一的检测、跟踪与近事故分析挑战。
克服现有系统在处理高动态范围、光照变化及实时性约束方面的局限性。
开发一种统一的深度学习框架，整合外观与运动特征，以提升近事故预测性能。
构建一个新的、多样化的航拍视频数据集（TNAD），以支持基于视觉的交通分析并为近事故检测提供基准。
实现在真实世界智能交通系统（ITS）中部署所需的高速处理能力（20–30 fps）。

提出的方法

采用双流3D CNN架构：空间流利用单帧图像中的外观特征进行目标检测。
使用时间流网络从视频片段中提取运动特征，实现多目标跟踪与轨迹估计。
融合双流特征，基于空间接近度与冲突运动模式计算近事故概率。
在近事故定位中，将交并比（IoU）≥ 0.6作为真正例检测的阈值。
在自建的交通近事故数据集（TNAD）上进行训练与测试，该数据集包含57段仿真视频和51,123帧，采用稀疏采样策略进行训练。
在空间流中采用最先进目标检测方法，在时间流中采用密集轨迹计算以实现鲁棒跟踪。

实验结果

研究问题

RQ1统一的双流3D CNN架构能否在航拍交通视频中有效实现实时检测、跟踪与近事故检测？
RQ2与单模态方法相比，外观与运动特征的联合使用在多大程度上提升了近事故检测的准确性？
RQ3所提出的框架在不同交通状况与光照条件下，能否在20–30 fps范围内保持实时性能？
RQ4在新的、多样化的数据集（TNAD）上，该框架在近事故检测方面的性能如何进行定量比较？
RQ5该方法能否泛化至包含汽车、摩托车与行人等复杂交通交互的都市交叉路口场景？

主要发现

所提出的双流3D CNN在TNAD数据集上实现近事故检测的精确率为89.4%，召回率为83.3%，F1分数为86.3%。
系统在960×480分辨率下保持28 fps的实时性能，证明其适用于真实世界ITS的部署。
空间流通过外观特征有效检测车辆与近事故候选对象，而时间流通过运动模式显著提升跟踪精度。
外观与运动特征的融合显著增强了近事故检测能力，能够同时捕捉空间重叠与冲突轨迹。
TNAD数据集包含57段仿真视频的51,123帧，为近事故检测提供了多样化的基准，涵盖多样的交通交互场景。
定性结果表明，系统在拥堵、光照变化及复杂交叉路口操作等挑战性条件下均表现出鲁棒性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。