QUICK REVIEW

[論文レビュー] Cityscapes dataset for semantic urban scene understanding

Marius Cordts|arXiv (Cornell University)|Apr 6, 2016

Video Surveillance and Tracking Methods参考文献 14被引用数 1,003

ひとこと要約

本論文は Cityscapes を紹介します。これは、都市の街路シーンのピクセルレベルおよびインスタンスレベルのセマンティックラベリングのための大規模データセットとベンチマークで、密な高精度注釈と 50 都市に跨る粗注釈セットを提供します。また、ベンチマーク上の最先端手法を評価する関連研究も提供します。

ABSTRACT

Visual understanding of complex urban street scenes is an enabling factor for a wide range of applications. Object detection has benefited enormously from large-scale datasets, especially in the context of deep learning. For semantic urban scene understanding, however, no current dataset adequately captures the complexity of real-world urban scenes. To address this, we introduce Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling. Cityscapes is comprised of a large, diverse set of stereo video sequences recorded in streets from 50 different cities. 5000 of these images have high quality pixel-level annotations; 20000 additional images have coarse annotations to enable methods that leverage large volumes of weakly-labeled data. Crucially, our effort exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity. Our accompanying empirical study provides an in-depth analysis of the dataset characteristics, as well as a performance evaluation of several state-of-the-art approaches based on our benchmark.

研究の動機と目的

都市の意味理解を促進し、既存データセットのギャップに対処する。
高品質なピクセルレベルおよびインスタンスレベルの注釈を備えた、大規模で多様なデータセットを提供する。
都市の走行シナリオにおけるピクセルレベルおよびインスタンスレベルのセマンティックラベリング手法の学習と評価を可能にする。
ステレオ深度情報と、手法をベンチマークするための明確な train/val/test 分割を提供する。

提案手法

50都市で動く車両から数十万のフレームを収集する。
5000画像に密なピクセルレベルおよびインスタンスレベルの注釈を付ける。20,000画像を粗注釈で提供する。
注釈中に深度順序が含意される形で、ステレオ HDR および LDR 画像ペアを提供する。
評価のために 30 の視覚クラスを 8 カテゴリに分類して定義し、ベンチマークには 19 クラスを有効にする。

実験結果

リサーチクエスチョン

RQ1大規模で多様な都市街路画像データセットは、自動運転の意味ラベリング性能をどのように改善できるか。
RQ2高品質な細かな注釈と粗い注釈の違いが、セグメンテーション性能にどのような影響を及ぼすか。
RQ3豊富なインスタンスレベルおよび深度順注釈を含むデータセットに対して、最先端のセマンティックラベリング手法はどのように性能を示すか。

主な発見

Cityscapes は、サイズ、注釈の豊富さ、シーンの多様性、複雑さの点で従来のデータセットを凌駕する。
本データセットには、50都市からの 5,000 枚の高精度注釈画像と 20,000 枚の粗注釈画像が含まれる。
公式分割は、注釈付きのトレーニング 2,975 枚、バリデーション 500 枚、テスト 1,525 枚を生み出す。テスト注釈は非公開。
評価は、照明条件や温度条件に沿った性能差を示し、多様な条件をデータセットに含める重要性を強調する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。