Skip to main content
QUICK REVIEW

[論文レビュー] A Countrywide Traffic Accident Dataset

Sobhan Moosavi, Mohammad Hossein Samavatian|arXiv (Cornell University)|Jun 12, 2019
Traffic and Road Safety参考文献 14被引用数 55
ひとこと要約

本論文は US-Accidents という、countrywide, publicly available dataset of about 2.25 million traffic accidents in the contiguous United States (2016–2019), augmented with location, weather, POI, and time context, plus a reproducible data-collection pipeline.

ABSTRACT

Reducing traffic accidents is an important public safety challenge. However, the majority of studies on traffic accident analysis and prediction have used small-scale datasets with limited coverage, which limits their impact and applicability; and existing large-scale datasets are either private, old, or do not include important contextual information such as environmental stimuli (weather, points-of-interest, etc.). In order to help the research community address these shortcomings we have - through a comprehensive process of data collection, integration, and augmentation - created a large-scale publicly available database of accident information named US-Accidents. US-Accidents currently contains data about $2.25$ million instances of traffic accidents that took place within the contiguous United States, and over the last three years. Each accident record consists of a variety of intrinsic and contextual attributes such as location, time, natural language description, weather, period-of-day, and points-of-interest. We present this dataset in this paper, along with a wide range of insights gleaned from this dataset with respect to the spatiotemporal characteristics of accidents. The dataset is publicly available at https://smoosavi.org/datasets/us_accidents.

研究の動機と目的

  • Provide a large-scale, publicly accessible US traffic accident dataset with broad geographic and attribute coverage.
  • Describe the data collection, integration, and augmentation pipeline used to create the dataset.
  • Enable research on spatiotemporal patterns, weather impacts, and POI correlations in traffic accidents.

提案手法

  • Collect streaming traffic events from MapQuest Traffic and Bing Map Traffic APIs.
  • Deduplicate entries using Haversine distance and time thresholds (250 meters, 10 minutes).
  • Augment accidents with reverse geocoding to obtain addresses via Nominatim.
  • Augment with weather data by linking to the closest Weather Underground station (average time lag ~15 minutes).
  • Augment with POI annotations from Open Street Map using a distance threshold selected to maximize correlation with regex-derived location cues (Algorithm 1).
  • Annotate period-of-day using TimeAndDate API to label day/night across multiple twilight systems.

実験結果

リサーチクエスチョン

  • RQ1What is the scale and attribute richness achievable in a countrywide traffic accident dataset from streaming sources?
  • RQ2How can heterogeneous data (weather, POI, time) be integrated and augmented to enhance accident analysis?
  • RQ3What are the spatiotemporal patterns and environmental correlations observable in a large-scale accident corpus?

主な発見

  • US-Accidents contains about 2.25 million accident records for the contiguous United States from February 2016 to March 2019.
  • About 24,600 duplicates were removed, yielding 2.25 million accidents after integration.
  • Approximately 27.5% of accidents were augmented with at least one POI type; 73% of accidents occurred after sunrise.
  • Around 32% of accidents occurred on or near local roads, and about 40% on or near high-speed roads; Bing reports more on high-speed roads while MapQuest reports more on intersections.
  • The best POI annotation threshold values identified were 30 meters for intersections and 100 meters for junctions.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。