QUICK REVIEW

[논문 리뷰] Object Detection for Comics using Manga109 Annotations

Toru Ogawa, Atsushi Otsubo|arXiv (Cornell University)|2018. 03. 23.

Advanced Image and Video Retrieval Techniques참고 문헌 20인용 수 42

한 줄 요약

이 논문은 Manga109-annotations를 소개합니다(대규모 수동 주석 만화 데이터셋)와 겹쳐진 만화 객체를 다루기 위한 앵커 강제 탐지기인 SSD300-fork를 제시하며, Manga109-annotations에서 최첨단 mAP를 달성합니다.

ABSTRACT

With the growth of digitized comics, image understanding techniques are becoming important. In this paper, we focus on object detection, which is a fundamental task of image understanding. Although convolutional neural networks (CNN)-based methods archived good performance in object detection for naturalistic images, there are two problems in applying these methods to the comic object detection task. First, there is no large-scale annotated comics dataset. The CNN-based methods require large-scale annotations for training. Secondly, the objects in comics are highly overlapped compared to naturalistic images. This overlap causes the assignment problem in the existing CNN-based methods. To solve these problems, we proposed a new annotation dataset and a new CNN model. We annotated an existing image dataset of comics and created the largest annotation dataset, named Manga109-annotations. For the assignment problem, we proposed a new CNN-based detector, SSD300-fork. We compared SSD300-fork with other detection methods using Manga109-annotations and confirmed that our model outperformed them based on the mAP score.

연구 동기 및 목표

만화에서 객체 탐지를 촉진하고 대규모 주석 데이터셋의 부족 문제를 해결한다.
프레임, 텍스트, 얼굴, 신체에 대한 바운딩 박스를 포함하고 문자 이름, 텍스트 내용 등의 추가 주석을 가진 Manga109-annotations를 만든다.
겹쳐진 만화 객체에 맞춘 객체 검출기를 개발하여 학습 및 추론 성능을 향상시킨다.

제안 방법

바운딩 박스와 범주 레이블(frame, text, face, body)을 가진 Manga109-annotations를 만들기 위해 Manga109를 주석화한다.
중첩된 객체의 심한 중첩 문제를 다루기 위해 범주별 탐지 계층을 복제한 SSD300의 포크형 변형인 SSD300-fork를 제안한다.
네 가지 범주 간 탐지의 균형을 맞추기 위해 가중치가 부여된 범주별 손실을 사용한다.
VGG-16 백본과 표준 SSD 데이터 증강으로 학습하고, 하드 네거티브 마이닝을 적용한다.
Manga109-annotations에서 Faster R-CNN, SSD300, YOLOv2와 대조 평가하고 교차 데이터세트 분석을 위해 eBDtheque와 비교한다.

실험 결과

연구 질문

RQ1대규모 수동 주석 만화 데이터셋이 만화 페이지에서 객체 탐지 성능을 향상시킬 수 있는가?
RQ2범주당 탐지 계층을 복제하는 것이 (SSD300-fork) 겹쳐진 만화 객체로 인한 할당/레이블링 문제를 완화하는가?
RQ3프레임, 텍스트, 얼굴, 신체 전반에서 mAP 측면에서 SSD300-fork가 기존 CNN 기반 탐지기와 만화 데이터에 대해 어떻게 비교되는가?
RQ4Manga109-annotations에서 학습된 모델이 서로 다른 그림 스타일의 다른 데이터셋(eBDtheque)으로 얼마나 잘 전이되는가?

주요 결과

방법	mAP	프레임	텍스트	얼굴	신체
Faster R-CNN	49.9	96.1	23.8	15.7	63.9
SSD300	81.3	97.1	82.0	67.1	79.1
YOLOv2	59.7	90.2	64.6	37.1	46.9
SSD300-fork	84.2	96.9	84.1	76.2	79.6

Manga109-annotations는 10,130페이지에 걸쳐 527,685개의 바운딩 박스 주석을 제공하며 네 가지 객체 범주와 추가 텍스트/캐릭터 데이터를 포함한다.
SSD300-fork는 Manga109-annotations에서 전반적인 mAP 84.2%로 기본 SSD300 및 다른 탐지기들을 능가하며, 얼굴 범주에서 특히 큰 향상을 보인다(76.2% 대 SSD300의 67.1%).
SSD300-fork는 Manga109-annotations 벤치마크에서 Faster R-CNN(49.9%), YOLOv2(59.7%), SSD300(81.3%)보다 더 높은 mAP(84.2%)를 달성한다.
eBDtheque에서 SSD300-fork는 프레임 탐지에서 경쟁적 recall 73.3%, precision 76.4%, F 74.8%를 달성하고, 이전 방법보다 신체 탐지에서 현저히 우수한 성능( recall 42.2%, precision 58.0%, F 48.8%)을 보인다.
포크된 아키텍처는 각 범주를 고유의 앵커 세트에 할당하여 중첩되는 객체를 처리할 수 있게 하면서 매개변수 수와 런타임을 SSD300에 가깝게 유지한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.