QUICK REVIEW

[论文解读] Shape Robust Text Detection with Progressive Scale Expansion Network

Wenhai Wang, Enze Xie|arXiv (Cornell University)|Mar 28, 2019

Handwritten Text Recognition Techniques被引用 66

一句话总结

PSENet 通过为每个文本实例生成多种核尺度并使用 BFS 逐步扩展来分离彼此靠近的文本，从而检测任意形状的文本，在曲线文本基准（例如 CTW1500）上达到最先进的结果，并在 Total-Text 和 ICDAR 数据集上表现出色。

ABSTRACT

Scene text detection has witnessed rapid progress especially with the recent development of convolutional neural networks. However, there still exists two challenges which prevent the algorithm into industry applications. On the one hand, most of the state-of-art algorithms require quadrangle bounding box which is in-accurate to locate the texts with arbitrary shape. On the other hand, two text instances which are close to each other may lead to a false detection which covers both instances. Traditionally, the segmentation-based approach can relieve the first problem but usually fail to solve the second challenge. To address these two challenges, in this paper, we propose a novel Progressive Scale Expansion Network (PSENet), which can precisely detect text instances with arbitrary shapes. More specifically, PSENet generates the different scale of kernels for each text instance, and gradually expands the minimal scale kernel to the text instance with the complete shape. Due to the fact that there are large geometrical margins among the minimal scale kernels, our method is effective to split the close text instances, making it easier to use segmentation-based methods to detect arbitrary-shaped text instances. Extensive experiments on CTW1500, Total-Text, ICDAR 2015 and ICDAR 2017 MLT validate the effectiveness of PSENet. Notably, on CTW1500, a dataset full of long curve texts, PSENet achieves a F-measure of 74.3% at 27 FPS, and our best F-measure (82.2%) outperforms state-of-art algorithms by 6.6%. The code will be released in the future.

研究动机与目标

解决自然场景中检测任意形状文本的挑战。
提出一个基于核的框架，在保留分割优势的同时实现实例分离。
开发一个渐进尺度扩张算法，以从最小核重建完整的文本形状。
提供适用于多尺度核监督的标签生成与损失设计。
展示在曲线文本、多方向以及多语言的基准测试中的鲁棒性。

提出的方法

骨干网络：带 FPN 的 ResNet，将多尺度特征融合为一个 1024 通道的特征图 F。
产生 n 个分割输出 S1,...,Sn，对应逐步增大的核。
从最小核通过连通组件初始化检测，并通过类似 BFS 的尺度扩张进行扩展。
通过使用 Vatti 剪裁缩小原始文本多边形以创建 G1,...,Gn 的真值掩码来进行标签生成。
损失：将 complete-text 损失 Rc 与 shrunk-text 损失 Rs 结合，使用 Dice 系数以解决类别不平衡，并对 Rc 采用在线难样本挖掘（OHEM）。

实验结果

研究问题

RQ1是否可以增强基于分割的方法，以准确检测任意形状的文本并分离相邻密集的实例？
RQ2从多个核尺度进行的渐进尺度扩张是否能在不牺牲定位精度的情况下改善实例分离？
RQ3多核监督与 BFS 风格扩展对曲线文本基准测试和多语言数据集的性能有何影响？

主要发现

PSENet 在曲线文本基准上达到高 F 值，特别是在 CTW1500 上的 F 值为 82.2%（单尺度骨干变体），在报道的设置下为 27 FPS 的情况下达到 74.3% F。
在 CTW1500 上，PSENet 在 F-measure 上领先于现有方法 6.6%。
在 Total-Text 上，PSENet 获得 80.9% 的 F-measure（单尺度，外部数据变体提供更高的精度）。
PSENet 通过更深的骨干网络（ResNet50/101/152）提升性能，在 IC17-MLT 使用 ResNet152 达到 72.13% F。
渐进尺度扩张能够有效分离彼此靠近的文本实例，并在 ICDAR 2015、ICDAR 2017 MLT、CTW1500 和 Total-Text 等数据集上对曲线文本进行鲁棒处理。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。