QUICK REVIEW

[論文レビュー] Object Detection for Comics using Manga109 Annotations

Toru Ogawa, Atsushi Otsubo|arXiv (Cornell University)|Mar 23, 2018

Advanced Image and Video Retrieval Techniques参考文献 20被引用数 42

ひとこと要約

この論文は Manga109-annotations を大規模に手作業で注釈付けした漫画データセットと、重なりが大きい漫画オブジェクトを扱うためのアンカー強制検出器 SSD300-fork を導入し、Manga109-annotations で最先端の mAP を達成した。

ABSTRACT

With the growth of digitized comics, image understanding techniques are becoming important. In this paper, we focus on object detection, which is a fundamental task of image understanding. Although convolutional neural networks (CNN)-based methods archived good performance in object detection for naturalistic images, there are two problems in applying these methods to the comic object detection task. First, there is no large-scale annotated comics dataset. The CNN-based methods require large-scale annotations for training. Secondly, the objects in comics are highly overlapped compared to naturalistic images. This overlap causes the assignment problem in the existing CNN-based methods. To solve these problems, we proposed a new annotation dataset and a new CNN model. We annotated an existing image dataset of comics and created the largest annotation dataset, named Manga109-annotations. For the assignment problem, we proposed a new CNN-based detector, SSD300-fork. We compared SSD300-fork with other detection methods using Manga109-annotations and confirmed that our model outperformed them based on the mAP score.

研究の動機と目的

漫画における物体検出を動機づけ、大規模な注釈付きデータセットの不足に対処する。
フレーム、テキスト、顔、身体の境界ボックスと追加の注釈（キャラクター名、テキスト内容）を含む Manga109-annotations を作成する。
重なり合う漫画オブジェクトに適した物体検出器を開発し、学習と推論の性能を向上させる。

提案手法

Manga109 を注釈付けして、 bounding boxes とカテゴリラベル（frame、text、face、body）を含む Manga109-annotations を作成する。
SSD300-fork を提案する。これは heavy object overlap に対処するためにカテゴリごとに検出レイヤを複製した SSD300 の分岐版である。
4 カテゴリ間の検出をバランスさせるために重み付きカテゴリ別損失を使用する。
VGG-16 バックボーンと標準的な SSD データ拡張を用いて学習し、ハードネガティブマイニングを適用する。
Manga109-annotations 上で Faster R-CNN、SSD300、YOLOv2 を比較評価し、データセット横断分析のために eBDtheque と比較する。

実験結果

リサーチクエスチョン

RQ1大規模で手作業で注釈付けされた漫画データセットは、漫画ページの物体検出性能を改善できるか？
RQ2カテゴリごとに検出レイヤを複製する（SSD300-fork）は、高重な漫画オブジェクトによって生じる割り当て／ラベリングの問題を緩和するか？
RQ3フレーム、テキスト、顔、身体の mAP の観点で、SSD300-fork は manga データ上の既存の CNN ベース検出器とどのように比較されるか？
RQ4Manga109-annotations で訓練されたモデルは、異なる描画スタイルを持つ別データセット eBDtheque へどれだけ転移できるか？

主な発見

方法	mAP	frame	text	face	body
Faster R-CNN	49.9	96.1	23.8	15.7	63.9
SSD300	81.3	97.1	82.0	67.1	79.1
YOLOv2	59.7	90.2	64.6	37.1	46.9
SSD300-fork	84.2	96.9	84.1	76.2	79.6

Manga109-annotations は 10,130 ページにわたる 527,685 個の bounding-box 注釈を提供し、4 つのオブジェクトカテゴリと追加のテキスト／キャラクタデータを含む。
SSD300-fork は Manga109-annotations でベースライン SSD300 および他の検出器を上回り、全体の mAP は 84.2%、顔カテゴリで顕著な向上を示す（76.2% 対 67.1% for SSD300）。
SSD300-fork は Manga109-annotations ベンチマークで Faster R-CNN (49.9%)、YOLOv2 (59.7%)、SSD300 (81.3%) を上回る mAP (84.2%) を達成。
eBDtheque では SSD300-fork はフレーム検出で競争力のある recall 73.3%、precision 76.4%、F1 74.8% を達成し、ボディ検出は前法より大幅に改善（ recall 42.2%、precision 58.0%、F 48.8%）。
分岐アーキテクチャは各カテゴリを独自のアンカーセットに割り当てることで重なるオブジェクトの処理を可能にし、パラメータ数と実行時は SSD300 に近いままを維持する。）

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。