QUICK REVIEW

[論文レビュー] Deep Direct Regression for Multi-Oriented Scene Text Detection

Wenhao He, Xu-Yao Zhang|arXiv (Cornell University)|Mar 24, 2017

Handwritten Text Recognition Techniques参考文献 22被引用数 54

ひとこと要約

本論文は、多方向のシーンテキスト検出のためProposalやアンカを回避する直接回帰フレームワークを導入し、ICDAR2015 Incidental Textで最先端の結果を達成し、他のベンチマークでも高い性能を示しています。

ABSTRACT

In this paper, we first provide a new perspective to divide existing high performance object detection methods into direct and indirect regressions. Direct regression performs boundary regression by predicting the offsets from a given point, while indirect regression predicts the offsets from some bounding box proposals. Then we analyze the drawbacks of the indirect regression, which the recent state-of-the-art detection structures like Faster-RCNN and SSD follows, for multi-oriented scene text detection, and point out the potential superiority of direct regression. To verify this point of view, we propose a deep direct regression based method for multi-oriented scene text detection. Our detection framework is simple and effective with a fully convolutional network and one-step post processing. The fully convolutional network is optimized in an end-to-end way and has bi-task outputs where one is pixel-wise classification between text and non-text, and the other is direct regression to determine the vertex coordinates of quadrilateral text boundaries. The proposed method is particularly beneficial for localizing incidental scene texts. On the ICDAR2015 Incidental Scene Text benchmark, our method achieves the F1-measure of 81%, which is a new state-of-the-art and significantly outperforms previous approaches. On other standard datasets with focused scene texts, our method also reaches the state-of-the-art performance.

研究の動機と目的

検出の直接回帰と間接回帰を比較分析し、直接回帰が多方向テキストに有利であると主張する。
提案する深い直接回帰フレームワークは、提案ではなく画像の点から四辺形のテキスト境界を出力する。
エンドツーエンド訓練を可能にする2つのブランチのネットワーク（テキスト/非テキスト分類と頂点回帰）と1段階の後処理手順（RecallされたNMS）を採用する。
ICDAR2015 Incidental Scene Textで最先端の性能を示し、MSRA-TD500およびICDAR2013で競争力のある結果を示す。

提案手法

境界を提案からではなく一点から回帰する直接回帰を定義する。
多尺度特徴の融合を用いた完全畳み込みネットワークを用い、テキスト/非テキストマップと四辺形頂点オフセットマップを生成する。
分類にはヒンジ損失、回帰にはスムースL1損失を組み合わせたマルチタスク損失で訓練し、Scale&Shiftモジュールで回帰値の安定性を確保する。
密な四辺形を後処理で Refining し、最終検出にマージするRecallされた非極大抑制（Recall Non-Maximum Suppression）を適用する。
テストはマルチスケールのスライディングウィンドウ戦略とテキストスコアマップの閾値設定で候補領域を得る。

実験結果

リサーチクエスチョン

RQ1直接回帰は、提案に依存する間接回帰法と比較して多方向テキストの検出を改善するか？
RQ2ライングルーピングや単語分割のヒューリスティクスを用いず、単一のエンドツーエンドネットワークで四辺形境界を予測できるか？
RQ3提案されたRecall NMSは、混雑したテキストシーンにおける従来のNMSと比較して精度とリコールにどう影響するか？
RQ4標準的なシーンテキストベンチマーク（ICDAR2015 Incidental, MSRA-TD500, ICDAR2013）における本手法の性能は、これまでの最先端と比べてどうか？

主な発見

データセット	アルゴリズム	適合率	再現率	F値	時間
ICDAR2015 Incidental	Proposed (R-NMS)	0.82	0.80	0.81	–
ICDAR2015 Incidental	Proposed (T-NMS)	0.81	0.80	0.80	–
ICDAR2015 Incidental	Liu et al. [15]	0.73	0.68	0.71	–
ICDAR2015 Incidental	Tian et al. [21]	0.74	0.52	0.61	–
ICDAR2015 Incidental	Zhang et al. [26]	0.71	0.43	0.54	–
ICDAR2015 Incidental	StradVision2 [11]	0.77	0.37	0.50	–
ICDAR2015 Incidental	StradVision1 [11]	0.53	0.46	0.47	–
ICDAR2015 Incidental	NJU-Text [11]	0.70	0.36	0.47	–
ICDAR2015 Incidental	AJOU [11]	0.47	0.47	0.47	–
ICDAR2015 Incidental	HUST_MCLAB [11]	0.44	0.38	0.41	–
MSRA-TD500	Proposed	0.77	0.70	0.74	–
MSRA-TD500	Zhang et al. [26]	0.83	0.67	0.74	–
MSRA-TD500	Yin et al. [24]	0.81	0.63	0.71	–
MSRA-TD500	Kang et al. [10]	0.71	0.62	0.66	–
MSRA-TD500	Yao et al. [23]	0.63	0.63	0.60	–
ICDAR2013 Focused	Proposed	0.92	0.81	0.86	0.9s
ICDAR2013 Focused	Liao et al. [13]	0.88	0.83	0.85	0.73s
ICDAR2013 Focused	Zhang et al. [26]	0.88	0.78	0.83	2.1s
ICDAR2013 Focused	He et al. [6]	0.93	0.73	0.82	–
ICDAR2013 Focused	Tian et al. [20]	0.85	0.76	0.80	1.4s

Recall NMSを用いるとICDAR2015 Incidental Scene TextでF1が81%に達し、従来法を上回る。
ICDAR2015では、Recall NMSを用いた提案手法が0.82/0.80/0.81 Precision/Recall/F1に達し、間接回帰ベースラインを凌駕する。
MSRA-TD500では0.77/0.70/0.74（Precision/Recall/F-measure）を達成する。
ICDAR2013 Focused Scene Textでは0.92/0.81/0.86（Precision/Recall/F-measure）で、1枚あたり0.9sと報告されている。
本アプローチはMSRA-TD500の英語と中国語テキストの両方に汎用性があり、 incidental text や perspective distortions に対して頑健性を示す。
直接回帰フレームワークは脆弱な提案生成を回避し、エンドツーエンド最適化と頑健な中心線ベースの正の領域表現の利点を享受する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。