QUICK REVIEW

[论文解读] Joint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition

Théodore Bluche|arXiv (Cornell University)|Apr 28, 2016

Handwritten Text Recognition Techniques参考文献 36被引用 98

一句话总结

本文用基于注意力的加权折叠取代标准的MDLSTM崩塌，以实现对手写段落的端到端转录，而无需显式行分割，在IAM和RIMES数据集上取得了具有竞争力的结果。

ABSTRACT

Offline handwriting recognition systems require cropped text line images for both training and recognition. On the one hand, the annotation of position and transcript at line level is costly to obtain. On the other hand, automatic line segmentation algorithms are prone to errors, compromising the subsequent recognition. In this paper, we propose a modification of the popular and efficient multi-dimensional long short-term memory recurrent neural networks (MDLSTM-RNNs) to enable end-to-end processing of handwritten paragraphs. More particularly, we replace the collapse layer transforming the two-dimensional representation into a sequence of predictions by a recurrent version which can recognize one line at a time. In the proposed model, a neural network performs a kind of implicit line segmentation by computing attention weights on the image representation. The experiments on paragraphs of Rimes and IAM database yield results that are competitive with those of networks trained at line level, and constitute a significant step towards end-to-end transcription of full documents.

研究动机与目标

推动减少离线手写识别对显式行分割的依赖。
提出一种端到端的段落转写模型，通过注意力隐式分割行。
将基于注意力的加权折叠集成到MDLSTM-RNNs中，以顺序读取各行。
在段落级别使用CTC损失对模型进行训练，同时利用BLSTM解码器。
在公开数据集IAM和RIMES上，将性能与基于行分割的基线进行比较。

提出的方法

使用MDLSTM-RNNs作为编码器，从段落图像中提取二维特征。
用加权的、注意力驱动的折叠替代标准的垂直折叠，以一次读取一行。
在二维特征图上计算注意力权重，形成特定于行的加权和。
使用（双向）LSTM解码器对行表示进行解码，必要时在连接行输出后再解码。
在段落级别使用CTC损失进行训练，必要时使用BLSTM解码器。
讨论对固定的读取步数进行迭代读取以及对完整文档布局的局限性。

实验结果

研究问题

RQ1是否能够在不使用显式行分割的情况下，通过基于注意力的MDLSTM机制实现端到端的段落转写？
RQ2通过注意力实现的隐式行分割与真实行分割相比，识别准确率的影响如何？
RQ3在使用固定的读取步数与预测停止标记以处理可变段落长度之间有哪些权衡？
RQ4在不同分辨率和分割条件下，所提方法在标准数据集(IAM、RIMES)上的表现如何？
RQ5扩展到完整文档页面的实际局限性和未来方向是什么？

主要发现

基于注意力的加权折叠相较于标准折叠和softmax基线显著提升CER。
在IAM上，带BLSTM解码器的注意力实现显著的CER下降（研究中报告的相对改进）。
在RIMES上，注意力模型实现了显著的CER改进，包括相对于基线的较大相对提升。
无需显式行分割的端到端段落转写与基于行分割的方法相比具有竞争力。
较高的输入分辨率使IAM和RIMES数据集的性能提升。
该系统在许多场景下，在有语言模型的情况下实现了有竞争力的WER/CER分数，且不需要真实行分割标注。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。