QUICK REVIEW

[论文解读] WordSup: Exploiting Word Annotations for Character based Text Detection

Hu Han, Chengquan Zhang|arXiv (Cornell University)|Aug 22, 2017

Handwritten Text Recognition Techniques参考文献 47被引用 44

一句话总结

该论文提出 WordSup，一种弱监督框架，仅使用词级标注训练基于字符的文本检测器，克服了真实世界数据集中字符级标注稀缺的问题。通过迭代地利用词级监督优化字符中心掩码和模型，该方法在 ICDAR13、ICDAR15 和 COCO-Text 基准上实现了最先进性能，实现了对形变文本和数学表达式等多种场景的鲁棒检测。

ABSTRACT

Imagery texts are usually organized as a hierarchy of several visual elements, i.e. characters, words, text lines and text blocks. Among these elements, character is the most basic one for various languages such as Western, Chinese, Japanese, mathematical expression and etc. It is natural and convenient to construct a common text detection engine based on character detectors. However, training character detectors requires a vast of location annotated characters, which are expensive to obtain. Actually, the existing real text datasets are mostly annotated in word or line level. To remedy this dilemma, we propose a weakly supervised framework that can utilize word annotations, either in tight quadrangles or the more loose bounding boxes, for character detector training. When applied in scene text detection, we are thus able to train a robust character detector by exploiting word annotations in the rich large-scale real scene text datasets, e.g. ICDAR15 and COCO-text. The character detector acts as a key role in the pipeline of our text detection engine. It achieves the state-of-the-art performance on several challenging scene text detection benchmarks. We also demonstrate the flexibility of our pipeline by various scenarios, including deformed text detection and math expression recognition.

研究动机与目标

为解决大规模、字符级标注的真实场景文本数据集稀缺的问题，这些数据集的创建成本高且耗时长。
实现在无需昂贵字符级标注的情况下训练鲁棒的字符检测器。
利用现有大规模真实世界数据集中的词级标注（如 ICDAR15、COCO-Text）进行字符检测训练。
开发一种灵活的基于字符的文本检测流程，适用于多种文本类型，包括形变文本和数学表达式。

提出的方法

一种弱监督训练框架，通过交替优化字符中心掩码和使用词级标注更新字符检测模型。
该框架采用基于图的字符分组方法，包含单重代价和成对代价：单重代价结合文本/非文本得分与字符间距离，而成对代价则利用字符对之间的角度距离。
通过 0 阶、1 阶或分段线性中心线估计文本行模型，模型选择基于高度拟合与复杂度惩罚之间的权衡。
基于计算出的多边形和控制点，使用薄板样条（TPS）变换将文本行校正为固定高度（H=32）的条带图像。
采用 CNN-RNN 架构进行词分割，使用 VGG-16 特征和一个 BLSTM 层，在校正后的行图像上预测词边界位置。
数据增强包括训练期间对合成和真实行图像进行随机裁剪、填充、模糊、噪声以及小幅度旋转（±5°）。

实验结果

研究问题

RQ1能否仅使用词级标注而非昂贵的字符级标注来有效训练字符检测器？
RQ2如何利用词级监督提升真实场景文本中字符检测的准确性和鲁棒性？
RQ3基于字符的检测流程能否在多种文本类型（如形变文本行和数学表达式）上实现泛化？
RQ4与现有方法相比，使用弱监督字符检测在标准基准上的性能提升如何？

主要发现

所提出的 WordSup 框架仅使用词级标注进行训练，即在 ICDAR13、ICDAR15 和 COCO-Text 基准上实现了最先进性能。
该方法展示了强大的泛化能力，能够有效检测形变文本行和结构化的数学表达式。
通过 WordSup 训练的字符检测器即使应用于真实场景文本，也优于仅依赖合成数据的现有基于字符的方法。
通过词级标注实现的弱监督使模型能够在大规模真实数据集（如 ICDAR15 和 COCO-Text）上进行训练，这些数据集原本因标注级别不匹配而无法用于字符检测。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。