QUICK REVIEW

[论文解读] Text-Pass Filter: An Efficient Scene Text Detector

Chuang Yang, Haozhao Ma|arXiv (Cornell University)|Jan 26, 2026

Handwritten Text Recognition Techniques被引用 0

一句话总结

该论文提出 Text-Pass Filter (TPF) 直接高效的任意形状场景文本检测，引入 REU 与 FPU 以提升文本特征一致性与前景判别。

ABSTRACT

To pursue an efficient text assembling process, existing methods detect texts via the shrink-mask expansion strategy. However, the shrinking operation loses the visual features of text margins and confuses the foreground and background difference, which brings intrinsic limitations to recognize text features. We follow this issue and design Text-Pass Filter (TPF) for arbitrary-shaped text detection. It segments the whole text directly, which avoids the intrinsic limitations. It is noteworthy that different from previous whole text region-based methods, TPF can separate adhesive texts naturally without complex decoding or post-processing processes, which makes it possible for real-time text detection. Concretely, we find that the band-pass filter allows through components in a specified band of frequencies, called its passband but blocks components with frequencies above or below this band. It provides a natural idea for extracting whole texts separately. By simulating the band-pass filter, TPF constructs a unique feature-filter pair for each text. In the inference stage, every filter extracts the corresponding matched text by passing its pass-feature and blocking other features. Meanwhile, considering the large aspect ratio problem of ribbon-like texts makes it hard to recognize texts wholly, a Reinforcement Ensemble Unit (REU) is designed to enhance the feature consistency of the same text and to enlarge the filter's recognition field to help recognize whole texts. Furthermore, a Foreground Prior Unit (FPU) is introduced to encourage TPF to discriminate the difference between the foreground and background, which improves the feature-filter pair quality. Experiments demonstrate the effectiveness of REU and FPU while showing the TPF's superiority.

研究动机与目标

在不因文本边缘缩放而产生畸变的情况下，支持实时任意形状场景文本检测。
通过直接提取整文本区域，省去以往整区方法所需的复杂解码/后处理。
引入带通滤波器启发的机制（TPF），生成文本特定特征与滤波器。
通过强化集成单元（REU）扩大识别场，提升对长条状文本的检测。
通过前景先验单元（FPU）提升前景-背景判别，增强特征-滤波质量。

提出的方法

端到端卷积神经网络框架，仿真带通滤波器以文本特定特征-滤波对实现整文本区域提取。
中心点预测头部+特征-滤波对生成器，用于产生每个文本的特征与滤波器。
强化集成单元（REU），(1) 增强同一文本内部的特征一致性，(2) 将同一文本的多个滤波器融合为一个加强的滤波器。
前景先验单元（FPU），学习前景与背景的分离并改善中心点定位。
滤波筛选后处理，执行并行、实例特定的文本提取，避免繁重的解码。

实验结果

研究问题

RQ1带通滤波器启发的机制是否能在不需要缩放掩码扩张或繁重后处理的情况下，准确分割整条文本实例？
RQ2REU 与 FPU 是否提升对任意形状文本（包括粘连和带状文本）的特征一致性、检测召回率与精度？
RQ3与现有整区或基于缩放掩码的方法相比，TPF 的推理性能如何？

主要发现

方法	REU	FPU	精度 (%)	召回 (%)	F1 (%)	FPS
baseline	✗	✗	87.9	79.2	83.3	33.6
baseline+	✓	✗	89.7	80.7	85.0	36.2
baseline+	✓	✓	89.9	82.8	86.2	37.7
(note)

在加入 REU 与 FPU 后，TPF 相对于基线在精度/召回率/F1 的提升明显（例如在 MSRA-TD500 上，F1 从 83.3 提升到 86.2，配合 REU 与 FPU）。
REU 提升特征一致性并扩大滤波器的识别场，使整文本分割可通过并行的通道-特征识别实现。
FPU 提升前景-背景判别，帮助更精确的中心点定位和文本实例分离。
由于滤波筛选实现并行文本处理，推理效率保持高，减少后处理。
在 MSRA-TD500 上，加入 REU 与 FPU 后 FPS 从 33.6 提升到 37.7（基线到基线+REU+FPU）。
相比基线，带有 REU 与 FPU 的 TP F 在参数量、FLOPs 与时间成本之间呈现更有利的权衡。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。