QUICK REVIEW

[论文解读] Deep Direct Regression for Multi-Oriented Scene Text Detection

Wenhao He, Xu-Yao Zhang|arXiv (Cornell University)|Mar 24, 2017

Handwritten Text Recognition Techniques参考文献 22被引用 54

一句话总结

本文提出一个直接回归框架用于多方向场景文本检测，避免候选区域与锚框，并在 ICDAR2015 Incidental Text 上达到最新水平，在其他基准上也表现强劲。

ABSTRACT

In this paper, we first provide a new perspective to divide existing high performance object detection methods into direct and indirect regressions. Direct regression performs boundary regression by predicting the offsets from a given point, while indirect regression predicts the offsets from some bounding box proposals. Then we analyze the drawbacks of the indirect regression, which the recent state-of-the-art detection structures like Faster-RCNN and SSD follows, for multi-oriented scene text detection, and point out the potential superiority of direct regression. To verify this point of view, we propose a deep direct regression based method for multi-oriented scene text detection. Our detection framework is simple and effective with a fully convolutional network and one-step post processing. The fully convolutional network is optimized in an end-to-end way and has bi-task outputs where one is pixel-wise classification between text and non-text, and the other is direct regression to determine the vertex coordinates of quadrilateral text boundaries. The proposed method is particularly beneficial for localizing incidental scene texts. On the ICDAR2015 Incidental Scene Text benchmark, our method achieves the F1-measure of 81%, which is a new state-of-the-art and significantly outperforms previous approaches. On other standard datasets with focused scene texts, our method also reaches the state-of-the-art performance.

研究动机与目标

介绍并分析检测中的直接回归与间接回归的差异，论证直接回归在多方向场景文本中的优势。
提出一个深度直接回归框架，从图像点输出文本边界的四边形，而不使用候选区域。
通过两分支网络（文本/非文本分类与顶点回归）实现端到端训练，并通过一键后处理步骤（Recall NMS）实现。
在 ICDAR2015 Incidental Scene Text 上展示最先进的性能，并在 MSRA-TD500 与 ICDAR2013 上实现具竞争力的结果。

提出的方法

定义直接回归：边界从一个点回归，而不是从一个候选区域回归。
使用带多尺度特征融合的全卷积网络，生成文本/非文本图与四边形顶点偏移图。
使用多任务损失进行训练，将分类的 hinge 损失与回归的 smooth L1 损失结合，并用 Scale&Shift 模块实现稳定的回归值。
应用 Recall Non-Maximum Suppression 来细化并将密集四边形合并为最终检测结果。
通过多尺度滑窗策略测试，并在文本分数图上进行阈值处理以获得候选区域。

实验结果

研究问题

RQ1直接回归是否相比依赖候选区域的间接回归方法在多方向文本检测上有改进？
RQ2一个端到端的单网络是否能在不使用线段分组或单词分割启发式的情况下预测文本区域的四边形边界？
RQ3与传统的 NMS 在拥挤文本场景中的表现相比，提出的 Recall NMS 如何影响精确度与召回率？
RQ4该方法在标准场景文本基准上的表现（ICDAR2015 Incidental、MSRA-TD500、ICDAR2013）相对于以往的最先进水平如何？

主要发现

数据集	算法	精度	召回	F-measure	运行时间
ICDAR2015 Incidental	Proposed (R-NMS)	0.82	0.80	0.81	–
ICDAR2015 Incidental	Proposed (T-NMS)	0.81	0.80	0.80	–
ICDAR2015 Incidental	Liu et al. [15]	0.73	0.68	0.71	–
ICDAR2015 Incidental	Tian et al. [21]	0.74	0.52	0.61	–
ICDAR2015 Incidental	Zhang et al. [26]	0.71	0.43	0.54	–
ICDAR2015 Incidental	StradVision2 [11]	0.77	0.37	0.50	–
ICDAR2015 Incidental	StradVision1 [11]	0.53	0.46	0.47	–
ICDAR2015 Incidental	NJU-Text [11]	0.70	0.36	0.47	–
ICDAR2015 Incidental	AJOU [11]	0.47	0.47	0.47	–
ICDAR2015 Incidental	HUST_MCLAB [11]	0.44	0.38	0.41	–
MSRA-TD500	Proposed	0.77	0.70	0.74	–
MSRA-TD500	Zhang et al. [26]	0.83	0.67	0.74	–
MSRA-TD500	Yin et al. [24]	0.81	0.63	0.71	–
MSRA-TD500	Kang et al. [10]	0.71	0.62	0.66	–
MSRA-TD500	Yao et al. [23]	0.63	0.63	0.60	–
ICDAR2013 Focused	Proposed	0.92	0.81	0.86	0.9s
ICDAR2013 Focused	Liao et al. [13]	0.88	0.83	0.85	0.73s
ICDAR2013 Focused	Zhang et al. [26]	0.88	0.78	0.83	2.1s
ICDAR2013 Focused	He et al. [6]	0.93	0.73	0.82	–
ICDAR2013 Focused	Tian et al. [20]	0.85	0.76	0.80	1.4s

在 ICDAR2015 Incidental Scene Text 上通过 Recall NMS 达到 81% 的 F1，优于先前方法。
在 ICDAR2015 上，提出方法配合 Recall NMS 达到 0.82/0.80/0.81 的精确度/召回率/F1，超过间接回归基线。
在 MSRA-TD500 上，该方法达到 0.77/0.70/0.74（Precision/Recall/F-measure）。
在 ICDAR2013 Focused Scene Text 上，该方法达到 0.92/0.81/0.86（Precision/Recall/F-measure），并报告每张图像 0.9s。
该方法对 MSRA-TD500 的英汉文本均具普遍性，并对偶发文本与透视畸变具有鲁棒性。
直接回归框架避免了脆弱的候选生成，并从端到端优化及基于中心线的正区域表述中获益。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。