[论文解读] TextSR: Content-Aware Text Super-Resolution Guided by Recognition
TextSR 同时学习超分辨率和文本识别,使用新颖的 Text Perceptual Loss 引导 SR 朝向对识别友好的文本内容,从而在小而模糊的文本识别上提升。
Scene text recognition has witnessed rapid development with the advance of convolutional neural networks. Nonetheless, most of the previous methods may not work well in recognizing text with low resolution which is often seen in natural scene images. An intuitive solution is to introduce super-resolution techniques as pre-processing. However, conventional super-resolution methods in the literature mainly focus on reconstructing the detailed texture of natural images, which typically do not work well for text due to the unique characteristics of text. To tackle these problems, in this work, we propose a content-aware text super-resolution network to generate the information desired for text recognition. In particular, we design an end-to-end network that can perform super-resolution and text recognition simultaneously. Different from previous super-resolution methods, we use the loss of text recognition as the Text Perceptual Loss to guide the training of the super-resolution network, and thus it pays more attention to the text content, rather than the irrelevant background area. Extensive experiments on several challenging benchmarks demonstrate the effectiveness of our proposed method in restoring a sharp high-resolution image from a small blurred one, and show that the recognition performance clearly boosts up the performance of text recognizer. To our knowledge, this is the first work focusing on text super-resolution. Code will be released in https://github.com/xieenze/TextSR.
研究动机与目标
- 激励在文本较小或模糊时提升场景文本识别。
- 开发一个端到端网络,将超分辨率与文本识别耦合。
- 引入 Text Perceptual Loss,将识别损失反向传播到 SR 生成器,以强调文本内容而非背景。
提出的方法
- 使用生成器-判别器架构进行 4x 超分辨率。
- 集成文本识别器(ASTER)以提供识别反馈。
- 通过将文本识别损失反向传播到生成器训练中,引入 Text Perceptual Loss (TPL)。
- 进行端到端训练或以带 ASTER 的分阶段变体,以引导 SR 朝向可识别文本。
实验结果
研究问题
- RQ1相较于传统 SR 方法,面向内容感知的超分辨率是否能提升对小而模糊文本的识别?
- RQ2相比基于一般图像内容的感知损失,Text Perceptual Loss 是否能产生更有利于识别的 SR 输出?
- RQ3在标准基准上,结合文本识别器的端到端训练是否有利于下游的识别准确性?
- RQ4在识别基准和极端下采样条件下,TextSR 相较于 SRGAN 和双三次(bicubic)基线的表现如何?
主要发现
- TextSR 在多个数据集上在 PSNR 和 SSIM 指标上持续超越 SRGAN。
- TextSR 在识别方面相较于 SRGAN 获得显著提升,特别是在非常小的文本上(例如 IC13 上 20x5 图像的提升高达 22.8%)。
- 使用 Text Perceptual Loss 的端到端或分阶段训练产生更具内容感知的 SR 结果,在 IC13、IC15、SVT、SVTP、IIIT5K 和 CUTE 等基准上提升识别性能。
- 当与强识别器(ASTER)配对时,TextSR 提升识别准确性,并且也扩展到检测时代的图像,获得可见的提升。
- 定性分析表明 TextSR 将响应聚焦在文本区域,产生比 SRGAN 更清晰、可识别的文本。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。