[论文解读] Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection
DMPNet 使用四边形滑动窗口和共享蒙特卡罗面积计算来紧密定位多方向场景文本,在 ICDAR 2015 Challenge 4(偶发场景文本)上实现了最先进的 F-measure。
Detecting incidental scene text is a challenging task because of multi-orientation, perspective distortion, and variation of text size, color and scale. Retrospective research has only focused on using rectangular bounding box or horizontal sliding window to localize text, which may result in redundant background noise, unnecessary overlap or even information loss. To address these issues, we propose a new Convolutional Neural Networks (CNNs) based method, named Deep Matching Prior Network (DMPNet), to detect text with tighter quadrangle. First, we use quadrilateral sliding windows in several specific intermediate convolutional layers to roughly recall the text with higher overlapping area and then a shared Monte-Carlo method is proposed for fast and accurate computing of the polygonal areas. After that, we designed a sequential protocol for relative regression which can exactly predict text with compact quadrangle. Moreover, a auxiliary smooth Ln loss is also proposed for further regressing the position of text, which has better overall performance than L2 loss and smooth L1 loss in terms of robustness and stability. The effectiveness of our approach is evaluated on a public word-level, multi-oriented scene text database, ICDAR 2015 Robust Reading Competition Challenge 4 "Incidental scene text localization". The performance of our method is evaluated by using F-measure and found to be 70.64%, outperforming the existing state-of-the-art method with F-measure 63.76%.
研究动机与目标
- 解决在检测多方向场景文本时的冗余背景和定位不精确问题。
- 提出基于文本内在形状的四边形滑动窗口以回忆文本。
- 开发一种快速的共享蒙特卡罗方法来计算多边形重叠。
- 引入序列点排序协议与光滑 Ln 损失以实现对四边形的鲁棒回归。
- 在 ICDAR 2015 偶发场景文本定位上展示最先进的性能。
提出的方法
- 在中间卷积层引入四边形滑动窗口以粗略回忆文本。
- 开发共享蒙特卡罗方法以高效计算多边形重叠面积。
- 应用序列化协议对四边形四个点进行排序以实现一致回归。
- 通过中心点和相对偏移来预测四边形坐标以实现两阶段定位。
- 提出用于回归的光滑 Ln 损失,以提高相对于 L2 和光滑 L1 损失的鲁棒性与稳定性。
实验结果
研究问题
- RQ1与矩形窗口相比,四边形滑动窗口能否提高多方向文本检测的召回率和精确率?
- RQ2共享蒙特卡罗计算是否能够为大量窗口实现快速且准确的多边形重叠计算?
- RQ3逐点回归四边形是否比基于矩形的方法提供更紧密的文本定位?
- RQ4光滑 Ln 损失是否在细粒度文本定位中更鲁棒、更稳定?
主要发现
| 算法 | Recall (%) | Precision (%) | Hmean (%) |
|---|---|---|---|
| 基线(SSD-VGGNet) | 25.48 | 63.25 | 36.326 |
| 提出的 DMPNet | 68.22 | 73.23 | 70.64 |
| Megvii-Image++ [33] | 56.96 | 72.40 | 63.76 |
| CTPN [29] | 51.56 | 74.22 | 60.85 |
| MCLAB_FCN [14] | 43.09 | 70.81 | 53.58 |
| StardVision-2 [14] | 36.74 | 77.46 | 49.84 |
| StardVision-1 [14] | 46.27 | 53.39 | 49.57 |
| CASIA_USTB-Cascaded [14] | 39.53 | 61.68 | 48.18 |
| NJU_Text [14] | 35.82 | 72.73 | 48.00 |
| AJOU [16] | 46.94 | 47.26 | 47.10 |
| HUST_MCLAB [14] | 37.79 | 44.00 | 40.66 |
| Deep2Text-MO [36] | 32.11 | 49.59 | 38.98 |
| CNN Proposal [14] | 34.42 | 34.71 | 34.57 |
| TextCatcher-2 [14] | 34.81 | 24.91 | 29.04 |
- 在 ICDAR 2015 Challenge 4 上达到 70.64% 的 F-measure,超越此前的最先进水平 (63.76%)。
- 四边形滑动窗口显著提升召回并降低背景噪声,相对于矩形窗口。
- 共享蒙特卡罗方法实现快速且适用于 GPU 并行的多边形重叠计算。
- 序列点排序实现一致的四边形回归,提升定位精度。
- 光滑 Ln 损失在边界回归中相对于 L2 和光滑 L1 损失显示出鲁棒性与稳定性优势。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。