QUICK REVIEW

[Paper Review] BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning

Fisher Yu, Haofeng Chen|arXiv (Cornell University)|May 12, 2018

Advanced Neural Network Applications36 references204 citations

TL;DR

The paper introduces BDD100K, a large-scale, diverse driving video dataset with 10 heterogeneous tasks and a benchmark for heterogeneous multitask learning, plus analyses of domain shifts and training strategies.

ABSTRACT

Datasets drive vision progress, yet existing driving datasets are impoverished in terms of visual content and supported tasks to study multitask learning for autonomous driving. Researchers are usually constrained to study a small set of problems on one dataset, while real-world computer vision applications require performing tasks of various complexities. We construct BDD100K, the largest driving video dataset with 100K videos and 10 tasks to evaluate the exciting progress of image recognition algorithms on autonomous driving. The dataset possesses geographic, environmental, and weather diversity, which is useful for training models that are less likely to be surprised by new conditions. Based on this diverse dataset, we build a benchmark for heterogeneous multitask learning and study how to solve the tasks together. Our experiments show that special training strategies are needed for existing models to perform such heterogeneous tasks. BDD100K opens the door for future studies in this important venue.

Motivation & Objective

Provide a large-scale, diverse driving video dataset with rich multi-granularity annotations.
Support a broad set of tasks (pixel-level, region-based, temporal) to enable heterogeneous multitask learning.
Establish benchmarks and baselines for homogeneous, cascaded, and heterogeneous multitask learning in driving scenarios.
Study how task diversity and data domain affect cross-task transfer and generalization.
Investigate annotation-budget allocation strategies for improving performance across tasks.

Proposed method

Collect and annotate 100K driving videos (720p, 30fps) from real-world routes using crowd-sourced footage and GPS/IMU data.
Provide ten tasks including image tagging, lane detection, drivable area segmentation, road object detection, semantic segmentation, instance segmentation, multiple object tracking (MOT), MOT with segmentation (MOTS), domain adaptation, and imitation learning.
Evaluate baseline methods under homogeneous, cascaded, and heterogeneous multitask settings to reveal training needs for diverse task outputs.
Analyze domain shift effects by comparing within-domain and cross-domain performance (e.g., Cityscapes vs. BDD100K, day vs. night).
Explore task cascades (e.g., detection aiding tracking, detection aiding segmentation) to understand cross-task benefits and budget allocation.

Experimental results

Research questions

RQ1How does a single model fare across heterogeneous driving tasks with different output structures?
RQ2What domain shifts exist between BDD100K and other driving datasets, and how do they impact detection and segmentation?
RQ3Can joint or cascaded multitask training improve performance on expensive tasks using cheaper annotations?
RQ4How should annotation budgets be allocated to maximize performance across tasks in heterogeneous multitask learning?
RQ5What are the challenges and benefits of combining tracking and segmentation tasks (MOT/MOTS) with detection and segmentation in a unified framework?

Key findings

The dataset enables heterogeneous multitask learning and reveals that special training strategies are needed for heterogeneous task combinations.
Significant domain discrepancy exists between datasets and conditions (e.g., day vs. night; city vs. non-city) impacting object detection performance.
Joint training and cascaded multitask setups can improve certain tasks (e.g., instance segmentation benefits from detection data; MOTS gains from upstream detection/segmentation signals).
Adding diverse but simpler tasks to support more complex tasks can improve performance, especially when annotation budgets are limited, though gains vary with task and data size.
Semantic segmentation and object detection experience notable domain shifts when trained on Cityscapes vs. BDD100K, implying complementary value of the new dataset.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.