QUICK REVIEW

[論文レビュー] The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale

Alina Kuznetsova, Hassan Rom|arXiv (Cornell University)|Nov 2, 2018

Multimodal Machine Learning Applications参考文献 57被引用数 613

ひとこと要約

Open Images V4 は、統一された大規模データセットを提供します。9.2M 枚の画像、19.8k 概念に対する 30.1M の画像レベルラベル、600 オブジェクトクラスのための 15.4M の境界ボックス、そして 57 の関係クラスに跨る 375k の視覚的関係アノテーションを含みます。

ABSTRACT

We present Open Images V4, a dataset of 9.2M images with unified annotations for image classification, object detection and visual relationship detection. The images have a Creative Commons Attribution license that allows to share and adapt the material, and they have been collected from Flickr without a predefined list of class names or tags, leading to natural class statistics and avoiding an initial design bias. Open Images V4 offers large scale across several dimensions: 30.1M image-level labels for 19.8k concepts, 15.4M bounding boxes for 600 object classes, and 375k visual relationship annotations involving 57 classes. For object detection in particular, we provide 15x more bounding boxes than the next largest datasets (15.4M boxes on 1.9M images). The images often show complex scenes with several objects (8 annotated objects per image on average). We annotated visual relationships between them, which support visual relationship detection, an emerging task that requires structured reasoning. We provide in-depth comprehensive statistics about the dataset, we validate the quality of the annotations, we study how the performance of several modern models evolves with increasing amounts of training data, and we demonstrate two applications made possible by having unified annotations of multiple types coexisting in the same images. We hope that the scale, quality, and variety of Open Images V4 will foster further research and innovation even beyond the areas of image classification, object detection, and visual relationship detection.

研究の動機と目的

Flickr から収集され、事前に選択されたクラスリストを持たない大規模な CC-BY ライセンスのデータセットを提供し、バイアスを低減し、クロスタスク研究を可能にする。
同じ画像内で画像分類、物体検出、および視覚検关系検出の統一アノテーションを提供する。
データが拡大するにつれて、広範な統計分析、アノテーション品質検証、モデル性能のベースライン検討を提供する。
統一されたアノテーションによって可能になるアプリケーションを示す。細分類検知やゼロショット視覚関係検出を含む。

提案手法

CC-BY ライセンスの下で ~9.2M の Flickr 画像を収集し、プライバシー/バイアス低減のためのフィルタリング（重複の除去やウェブ外画像の除去を含む）を行う。
アノテーションのために 19,794 の画像レベル概念と 600 のボックス可能なオブジェクトクラス（階層構造を有する）を定義する。
複数の画像分類器と人間の検証を組み合わせたコンピュータ支援ワークフローで画像レベルラベルをアノテーションする。
階層的重複排除と属性タグ付けを含む、極端なクリックとボックス検証系列を用いて 600 オブジェクトクラスのための 15.4M の境界ボックスをアノテーションする。
実世界の関係を現実化し得るオブジェクト対を選択し、それらを検証することで 374.8k の視覚的関係トリプレットをアノテーションする。非自明で非共起ベースの関係を含む。
分類・検出・視覚的関係の横断タスク訓練と分析に適したデータ収集・アノテーションパイプラインを提供する。

実験結果

リサーチクエスチョン

RQ1分類・検出・視覚関係タスク全体で大規模かつ統一的なアノテーションを一つのデータセットでどのように収集・検証できるか？
RQ2Open Images V4 の統計情報・品質特性・バイアスは、以前のデータセットと比べてどうか？
RQ3この規模で訓練データ量を増加させた場合、現代のモデルの性能はどのように変化するか？
RQ4統一されたアノテーションによって実現可能になる新しいクロスタスクの応用（例：細分類ボックスラベルなしの細分類検知、ゼロショット視覚関係検出）とは何か？

主な発見

Open Images V4 には 9.18M 枚の画像、30.11M の画像レベルラベル、19,794 の概念、15.44M の境界ボックス、600 オブジェクトクラス、374.77k の視覚的関係トリプレットが 57 の関係クラスに跨って含まれる。
平均して画像には約8つのアノテーション済みオブジェクトが含まれ、境界ボックスの総数は次に大きいデータセットの約 15 倍以上（15.4M ボックス対 1.9M 画像）である。
このデータセットは複雑なシーンと CC-BY ライセンスを強調し、商用文脈を含む広範な利用を可能にしつつ、統一アノテーションによるクロスタスク研究を可能にしている。
品質検証は幾何学的ボックス精度とアノテーション再現性を分析し、モデルのベースラインはデータ規模の拡大とともに性能傾向を示す。
統一されたアノテーションによって可能になる新しいアプリケーションとして、細分類ボックスラベルなしの細粒度検知とゼロショット視覚関係検出の2つを実証する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。