QUICK REVIEW

[論文レビュー] EAST: An Efficient and Accurate Scene Text Detector

Xinyu Zhou, Cong Yao|arXiv (Cornell University)|Apr 11, 2017

Handwritten Text Recognition Techniques参考文献 58被引用数 115

ひとこと要約

EAST は完全畳み込みの二段階テキスト検出器を提案し、全画像から回転長方形または多角形としてテキスト領域を直接予測し、最先端の精度と高速性を実現する。

ABSTRACT

Previous approaches for scene text detection have already achieved promising performances across various benchmarks. However, they usually fall short when dealing with challenging scenarios, even when equipped with deep neural network models, because the overall performance is determined by the interplay of multiple stages and components in the pipelines. In this work, we propose a simple yet powerful pipeline that yields fast and accurate text detection in natural scenes. The pipeline directly predicts words or text lines of arbitrary orientations and quadrilateral shapes in full images, eliminating unnecessary intermediate steps (e.g., candidate aggregation and word partitioning), with a single neural network. The simplicity of our pipeline allows concentrating efforts on designing loss functions and neural network architecture. Experiments on standard datasets including ICDAR 2015, COCO-Text and MSRA-TD500 demonstrate that the proposed algorithm significantly outperforms state-of-the-art methods in terms of both accuracy and efficiency. On the ICDAR 2015 dataset, the proposed algorithm achieves an F-score of 0.7820 at 13.2fps at 720p resolution.

研究の動機と目的

複数の中間ステップを避けた、シンプルでエンドツーエンドのシーンテキスト検出パイプラインの動機づけ。
任意の方向で、単語またはテキスト行レベルの領域を直接予測する。
回転したボックスや四角形など、柔軟なジオメトリ出力を、効率的な処理で有効にする。

提案手法

軽量な Fully Convolutional Network を用いて、ピクセルごとのテキストスコアとジオメトリマップを予測する。
二つのジオメトリ表現をサポートする：RBOX（回転付きの軸平行ボックス）と QUAD（四角形）と、それに対応する損失関数。
スコアマップのために四辺形を縮小してラベルを生成し、ピクセルごとにジオメトリターゲットを算出する。
スコア損失（バランスのとれたクロスエントロピー）とジオメトリ損失の組み合わせで訓練する：RBOXは IoU ベース、QUAD はスケール正規化平滑L1。
局所性識別 NMS を適用し、実際には O(n) 時間で近接する予測を効率的にマージする。

実験結果

リサーチクエスチョン

RQ1二段階 FCN パイプラインは、中間ステップを介さずに直接テキスト領域を予測し、依然として最先端の精度を達成できるのか？
RQ2異なるジオメトリ表現（RBOX vs QUAD）は、異なるデータセットで精度と効率にどのように比較されるか？
RQ3スケールを超えて堅牢なピクセル単位のテキストジオメトリ予測を生む損失設計と訓練戦略はどれか？

主な発見

ICDAR 2015、COCO-Text、MSRA-TD500 のベンチマークで高い精度と速度を達成。
ICDAR 2015 で F-score 0.7820、13.2 FPS（720p）; マルチスケール F-score 0.8072。
COCO-Text F-score は 0.3945; MSRA-TD500 F-score は 0.7608。
エンドツーエンド訓練を備えた二段階パイプラインは、精度と速度の両方で従来手法を上回る。
柔軟なジオメトリ出力：RBOX と QUAD は、異なるベースネットワーク（PVANET、PVANET2x、VGG16）で競争力のある結果を示す。
局所認識型 NMS は、精度を維持しつつポスト処理コストを大幅に削減する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。