[論文レビュー] A Countrywide Traffic Accident Dataset
本論文は US-Accidents という、countrywide, publicly available dataset of about 2.25 million traffic accidents in the contiguous United States (2016–2019), augmented with location, weather, POI, and time context, plus a reproducible data-collection pipeline.
Reducing traffic accidents is an important public safety challenge. However, the majority of studies on traffic accident analysis and prediction have used small-scale datasets with limited coverage, which limits their impact and applicability; and existing large-scale datasets are either private, old, or do not include important contextual information such as environmental stimuli (weather, points-of-interest, etc.). In order to help the research community address these shortcomings we have - through a comprehensive process of data collection, integration, and augmentation - created a large-scale publicly available database of accident information named US-Accidents. US-Accidents currently contains data about $2.25$ million instances of traffic accidents that took place within the contiguous United States, and over the last three years. Each accident record consists of a variety of intrinsic and contextual attributes such as location, time, natural language description, weather, period-of-day, and points-of-interest. We present this dataset in this paper, along with a wide range of insights gleaned from this dataset with respect to the spatiotemporal characteristics of accidents. The dataset is publicly available at https://smoosavi.org/datasets/us_accidents.
研究の動機と目的
- Provide a large-scale, publicly accessible US traffic accident dataset with broad geographic and attribute coverage.
- Describe the data collection, integration, and augmentation pipeline used to create the dataset.
- Enable research on spatiotemporal patterns, weather impacts, and POI correlations in traffic accidents.
提案手法
- Collect streaming traffic events from MapQuest Traffic and Bing Map Traffic APIs.
- Deduplicate entries using Haversine distance and time thresholds (250 meters, 10 minutes).
- Augment accidents with reverse geocoding to obtain addresses via Nominatim.
- Augment with weather data by linking to the closest Weather Underground station (average time lag ~15 minutes).
- Augment with POI annotations from Open Street Map using a distance threshold selected to maximize correlation with regex-derived location cues (Algorithm 1).
- Annotate period-of-day using TimeAndDate API to label day/night across multiple twilight systems.
実験結果
リサーチクエスチョン
- RQ1What is the scale and attribute richness achievable in a countrywide traffic accident dataset from streaming sources?
- RQ2How can heterogeneous data (weather, POI, time) be integrated and augmented to enhance accident analysis?
- RQ3What are the spatiotemporal patterns and environmental correlations observable in a large-scale accident corpus?
主な発見
- US-Accidents contains about 2.25 million accident records for the contiguous United States from February 2016 to March 2019.
- About 24,600 duplicates were removed, yielding 2.25 million accidents after integration.
- Approximately 27.5% of accidents were augmented with at least one POI type; 73% of accidents occurred after sunrise.
- Around 32% of accidents occurred on or near local roads, and about 40% on or near high-speed roads; Bing reports more on high-speed roads while MapQuest reports more on intersections.
- The best POI annotation threshold values identified were 30 meters for intersections and 100 meters for junctions.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。