QUICK REVIEW

[Paper Review] TextSR: Content-Aware Text Super-Resolution Guided by Recognition

Wenjia Wang, Enze Xie|arXiv (Cornell University)|Sep 16, 2019

Digital Media Forensic Detection44 references47 citations

TL;DR

TextSR jointly learns super-resolution and text recognition, using a novel Text Perceptual Loss to guide SR toward recognition-friendly text content, improving recognition on small blurred text.

ABSTRACT

Scene text recognition has witnessed rapid development with the advance of convolutional neural networks. Nonetheless, most of the previous methods may not work well in recognizing text with low resolution which is often seen in natural scene images. An intuitive solution is to introduce super-resolution techniques as pre-processing. However, conventional super-resolution methods in the literature mainly focus on reconstructing the detailed texture of natural images, which typically do not work well for text due to the unique characteristics of text. To tackle these problems, in this work, we propose a content-aware text super-resolution network to generate the information desired for text recognition. In particular, we design an end-to-end network that can perform super-resolution and text recognition simultaneously. Different from previous super-resolution methods, we use the loss of text recognition as the Text Perceptual Loss to guide the training of the super-resolution network, and thus it pays more attention to the text content, rather than the irrelevant background area. Extensive experiments on several challenging benchmarks demonstrate the effectiveness of our proposed method in restoring a sharp high-resolution image from a small blurred one, and show that the recognition performance clearly boosts up the performance of text recognizer. To our knowledge, this is the first work focusing on text super-resolution. Code will be released in https://github.com/xieenze/TextSR.

Motivation & Objective

Motivate improving scene text recognition when text is small or blurred.
Develop an end-to-end network that couples super-resolution with text recognition.
Introduce a Text Perceptual Loss that backpropagates recognition loss to the SR generator to emphasize text content over background.

Proposed method

Use a generator-discriminator architecture for 4x super-resolution.
Integrate a text recognizer (ASTER) to provide recognition feedback.
Introduce Text Perceptual Loss (TPL) by back-propagating text recognition loss into the generator training.
Train end-to-end or in staged variants with ASTER to guide SR toward recognizable text.

Experimental results

Research questions

RQ1Can content-aware super-resolution improve recognition of small, blurred text compared to traditional SR methods?
RQ2Does the Text Perceptual Loss lead to more recognition-friendly SR outputs than perceptual losses based on general image content?
RQ3Is end-to-end training with a text recognizer beneficial for downstream recognition accuracy on standard benchmarks?
RQ4How does TextSR perform on recognition benchmarks and under extreme downsampling compared with SRGAN and bicubic baselines?

Key findings

TextSR consistently surpasses SRGAN in PSNR and SSIM across multiple datasets.
TextSR yields substantial recognition gains over SRGAN, especially on very small text (e.g., up to 22.8% improvement on 20x5 images on IC13).
End-to-end or staged training with Text Perceptual Loss produces more content-aware SR results that boost recognition performance on benchmarks like IC13, IC15, SVT, SVTP, IIIT5K, and CUTE.
TextSR improves recognition accuracy when paired with a strong recognizer (ASTER), and also extends to detection-era images with visible gains.
Qualitative analyses show TextSR focuses responses on text areas, producing clearer and more identifiable text than SRGAN.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.