QUICK REVIEW

[논문 리뷰] Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection

Yuliang Liu, Lianwen Jin|arXiv (Cornell University)|2017. 03. 04.

Handwritten Text Recognition Techniques참고 문헌 34인용 수 58

한 줄 요약

DMPNet는 사각형 슬라이딩 윈도우와 공유 몬테카를로 영역 계산으로 다방향 장면 텍스트를 촘촘하게 로컬라이즈하며 ICDAR 2015 Challenge 4 (Incidental Scene Text)에서 최첨단 F-measure를 달성했다.

ABSTRACT

Detecting incidental scene text is a challenging task because of multi-orientation, perspective distortion, and variation of text size, color and scale. Retrospective research has only focused on using rectangular bounding box or horizontal sliding window to localize text, which may result in redundant background noise, unnecessary overlap or even information loss. To address these issues, we propose a new Convolutional Neural Networks (CNNs) based method, named Deep Matching Prior Network (DMPNet), to detect text with tighter quadrangle. First, we use quadrilateral sliding windows in several specific intermediate convolutional layers to roughly recall the text with higher overlapping area and then a shared Monte-Carlo method is proposed for fast and accurate computing of the polygonal areas. After that, we designed a sequential protocol for relative regression which can exactly predict text with compact quadrangle. Moreover, a auxiliary smooth Ln loss is also proposed for further regressing the position of text, which has better overall performance than L2 loss and smooth L1 loss in terms of robustness and stability. The effectiveness of our approach is evaluated on a public word-level, multi-oriented scene text database, ICDAR 2015 Robust Reading Competition Challenge 4 "Incidental scene text localization". The performance of our method is evaluated by using F-measure and found to be 70.64%, outperforming the existing state-of-the-art method with F-measure 63.76%.

연구 동기 및 목표

다방향 장면 텍스트를 탐지할 때 중복된 배경과 불정확한 로컬라이제이션 문제를 해결한다.
텍스트의 고유한 기하 형태를 기반으로 한 사각형 슬라이딩 윈도우를 제안하여 텍스트를 회상하도록 한다.
다각형 겹침을 계산하기 위한 빠른 공유 몬테카를로 방법을 개발한다.
사각형의 안정적 회귀를 위한 순차 포인트 정렬 프로토콜과 매끄러운 Ln 손실을 도입한다.
ICDAR 2015 Incidental Scene Text Localization에서 최첨단 성능을 입증한다.

제안 방법

중간 CNN 계층에 사각형 슬라이딩 윈도우를 도입하여 대략적으로 텍스트를 회상한다.
다각형 겹치는 영역을 효율적으로 계산하기 위한 공유 몬테카를로 방법을 개발한다.
일관된 회귀를 위해 네 개의 사각점 순서를 정하는 순차 프로토콜을 적용한다.
중심점과 상대 편향을 통해 사각형 좌표를 예측하여 2단계 로컬라이제이션을 가능하게 한다.
L2 및 smooth L1 손실에 비해 로버스트성과 안정성을 향상시키기 위한 매끄러운 Ln 손실을 회귀에 제안한다.

실험 결과

연구 질문

RQ1사각형 슬라이딩 윈도우가 직사각형 윈도우에 비교하여 다방향 텍스트 탐지의 재현율과 정밀도를 향상시킬 수 있는가?
RQ2다수의 윈도우에 대해 공유 몬테카를로 계산이 빠르고 정확한 다각형 겹침 계산을 가능하게 하는가?
RQ3순차적으로 사각형을 회귀하는 것이 직사각 기반 방법보다 더 촘촘한 텍스트 로컬라이제이션을 제공하는가?
RQ4매끄러운 Ln 손실이 미세한 텍스트 로컬라이제이션에 더 로버스트하고 안정적인가?

주요 결과

ICDAR 2015 Challenge 4에서 70.64% F-measure를 달성하여 이전 최첨단(63.76%)를 상회하였다.
사각형 슬라이딩 윈도우가 직사각형 윈도우에 비해 재현율을 크게 향상시키고 배경 노이즈를 감소시킨다.
공유 몬테카를로 방법은 GPU 병렬화에 적합한 빠르고 정확한 다각형 겹침 계산을 가능하게 한다.
순차 포인트 정렬은 일관된 사각형 회귀를 가능하게 하여 로컬라이제이션 정밀도를 향상시킨다.
매끄러운 Ln 손실은 경계 회귀에서 L2 및 smooth L1 손실에 비해 로버스트성과 안정성을 보여준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.