QUICK REVIEW

[论文解读] Robust Table Detection and Structure Recognition from Heterogeneous Document Images

Chixiang Ma, Weihong Lin|arXiv (Cornell University)|Mar 16, 2022

Handwritten Text Recognition Techniques参考文献 88被引用 61

一句话总结

RobusTabNet 提出了一种新颖的表格检测与结构识别框架，采用基于 CornerNet 的区域提议网络和基于空间 CNN 的分割合并范式以实现分隔线预测，以及基于 Grid CNN 的单元合并。该方法在六个公开基准上实现了最先进性能，并展现出对复杂、扭曲及弯曲表格的鲁棒性。

ABSTRACT

We introduce a new table detection and structure recognition approach named RobusTabNet to detect the boundaries of tables and reconstruct the cellular structure of each table from heterogeneous document images. For table detection, we propose to use CornerNet as a new region proposal network to generate higher quality table proposals for Faster R-CNN, which has significantly improved the localization accuracy of Faster R-CNN for table detection. Consequently, our table detection approach achieves state-of-the-art performance on three public table detection benchmarks, namely cTDaR TrackA, PubLayNet and IIIT-AR-13K, by only using a lightweight ResNet-18 backbone network. Furthermore, we propose a new split-and-merge based table structure recognition approach, in which a novel spatial CNN based separation line prediction module is proposed to split each detected table into a grid of cells, and a Grid CNN based cell merging module is applied to recover the spanning cells. As the spatial CNN module can effectively propagate contextual information across the whole table image, our table structure recognizer can robustly recognize tables with large blank spaces and geometrically distorted (even curved) tables. Thanks to these two techniques, our table structure recognition approach achieves state-of-the-art performance on three public benchmarks, including SciTSR, PubTabNet and cTDaR TrackB2-Modern. Moreover, we have further demonstrated the advantages of our approach in recognizing tables with complex structures, large blank spaces, as well as geometrically distorted or even curved shapes on a more challenging in-house dataset.

研究动机与目标

通过使用轻量级主干网络和基于 CornerNet 的区域提议网络，提升表格检测定位精度。
在存在大段空白区域、复杂层级结构和几何失真情况下，实现鲁棒的表格结构识别。
开发一种分割合并框架，有效处理跨行跨列单元格和非轴对齐的表格。
在公开基准和具有真实世界失真特征的挑战性内部数据集上验证性能。

提出的方法

使用 CornerNet 作为 Faster R-CNN 的区域提议网络，通过关键点检测生成高质量的表格提议。
采用基于空间 CNN 的分隔线预测模块，将检测到的表格分割为网格，利用全局上下文提升鲁棒性。
应用基于 Grid CNN 的单元合并模块，通过将表格建模为紧凑的特征网格来恢复跨行跨列单元格。
集成分割-合并流水线：首先通过分隔线将表格分割为单元格，再基于空间关系进行单元格合并。
使用轻量级 ResNet-18 作为主干网络，实现在低计算成本下的高性能。
端到端训练，检测和结构识别任务分别采用交叉熵损失和 IoU 基于损失。

实验结果

研究问题

RQ1基于 CornerNet 的区域提议是否能在轻量级主干网络下提升表格检测的定位精度？
RQ2基于空间 CNN 的信息传播机制是否能有效传递整个表格的上下文信息，以应对大段空白区域和弯曲形状？
RQ3基于 Grid CNN 的合并模块是否能在恢复跨行跨列单元格方面优于关系网络或 GCN？
RQ4该分割-合并框架在标准基准未涵盖的几何失真或弯曲表格上表现如何？
RQ5该方法是否能泛化到复杂的真实世界文档图像，而不仅限于扫描件或 PDF 数据集？

主要发现

RobusTabNet 在内部数据集上使用 IoU@0.9 时达到 94.6% 的 F1 分数，超越基线和先前方法，达到最先进水平。
基于空间 CNN 的消息传递方法实现 94.6% 的 WAvg. F1 分数，显著优于投影网络（93.0%）和 Bi-GRU（93.1%），尤其在复杂表格上表现突出。
基于 Grid CNN 的单元合并方法在内部数据集上实现 94.6% 的 WAvg. F1 分数，优于关系网络（93.2%）和 GCN（94.0%）。
在公开基准上，RobusTabNet 在 cTDaR TrackA、PubLayNet、IIIT-AR-13K、SciTSR、PubTabNet 和 cTDaR TrackB2-Modern 上均达到最先进结果。
该方法对弯曲和失真的表格表现出鲁棒性，定性结果表明在极端几何失真下仍能实现准确的分割与合并。
消融研究证实，空间 CNN 和 Grid CNN 组件均至关重要，移除任一模块均导致性能显著下降。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。