QUICK REVIEW

[论文解读] Real-time Scene Text Detection with Differentiable Binarization

Minghui Liao, Zhaoyi Wan|arXiv (Cornell University)|Nov 20, 2019

Handwritten Text Recognition Techniques参考文献 40被引用 48

一句话总结

本文提出 Differentiable Binarization (DB)，将二值化整合到分割网络中，使端到端训练成为可能，从而实现实时、任意形状场景文本检测，达到最先进的准确性和速度。

ABSTRACT

Recently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more accurately describe scene text of various shapes such as curve text. However, the post-processing of binarization is essential for segmentation-based detection, which converts probability maps produced by a segmentation method into bounding boxes/regions of text. In this paper, we propose a module named Differentiable Binarization (DB), which can perform the binarization process in a segmentation network. Optimized along with a DB module, a segmentation network can adaptively set the thresholds for binarization, which not only simplifies the post-processing but also enhances the performance of text detection. Based on a simple segmentation network, we validate the performance improvements of DB on five benchmark datasets, which consistently achieves state-of-the-art results, in terms of both detection accuracy and speed. In particular, with a light-weight backbone, the performance improvements by DB are significant so that we can look for an ideal tradeoff between detection accuracy and efficiency. Specifically, with a backbone of ResNet-18, our detector achieves an F-measure of 82.8, running at 62 FPS, on the MSRA-TD500 dataset. Code is available at: https://github.com/MhLiao/DB

研究动机与目标

推动基于分割的场景文本检测，针对不规则文本形状（曲线、多方向）
通过将二值化整合到网络训练中，消除繁重的后处理。
在保持高精度的同时，实现使用轻量级骨干网络的实时推理。
通过可学习的阈值图探索自适应阈值化，以提升文本与背景的辨识度。

提出的方法

提出 Differentiable Binarization (DB)，用一个使用自适应阈值图 T 的可微函数来近似二值化。
训练一个分割网络，使其同时预测概率图 P 和阈值图 T，并从 P 和 T 计算近似二值化图 B̂。
通过 DB 函数进行反向传播，以加强文本与背景之间的分离并区分紧邻的文本实例。
使用可变形卷积骨干网以改善对不规则文本形状的感受野。
通过多边形收缩/膨胀操作生成训练标签，为 P 和 T 提供监督。
在推理时，可选地去除阈值分支以提高效率，并从概率图或二值图形成边框。
对骨干网络（ResNet-18/50）、DB 的存在与否、阈值监督以及可变形卷积进行消融研究，以量化增益。

实验结果

研究问题

RQ1将可微分二值化步骤整合到分割网络中，是否能提升任意形状文本的检测准确度？
RQ2与固定阈值二值化相比，自适应、学习得到的阈值化是否能提升文本与背景的辨识度？
RQ3所提出的 DB 模块在轻量骨干如 ResNet-18 与较重骨干如 ResNet-50 下对速度与精度有何影响？
RQ4带有 DB 的端到端可训练性是否与在多项场景文本基准上的实时推理兼容？

主要发现

DB 在五个基准（水平、不同方向和曲线文本）上提供稳定的精度提升。
在 ResNet-18 下，该方法在 MSRA-TD500 上达到 62 FPS，并在各数据集上获得高 F 值。
推理时可以移除 DB 而不牺牲性能，保持速度。
可变形卷积根据骨干和数据集的不同，带来 1.5–5.0 点 F 值的提升。
对阈值图的监督带来额外提升（如在 MLT-2017 上）。
DB-ResNet-50 在曲线文本和多语言文本数据集上实现了先进或有竞争力的结果，并显著优于先前方法的速度。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。