QUICK REVIEW

[論文レビュー] CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor

Xiaohui Zhao, Niu, Endi|arXiv (Cornell University)|Mar 29, 2019

Topic Modeling参考文献 10被引用数 46

ひとこと要約

CUTIEは文書内のテキストをCNNベースでグリッド化するアプローチを提案し、意味情報と空間情報を共同利用してキー情報抽出を行い、少ない訓練データと事前学習なしで最先端の結果を達成する。

ABSTRACT

Extracting key information from documents, such as receipts or invoices, and preserving the interested texts to structured data is crucial in the document-intensive streamline processes of office automation in areas that includes but not limited to accounting, financial, and taxation areas. To avoid designing expert rules for each specific type of document, some published works attempt to tackle the problem by learning a model to explore the semantic context in text sequences based on the Named Entity Recognition (NER) method in the NLP field. In this paper, we propose to harness the effective information from both semantic meaning and spatial distribution of texts in documents. Specifically, our proposed model, Convolutional Universal Text Information Extractor (CUTIE), applies convolutional neural networks on gridded texts where texts are embedded as features with semantical connotations. We further explore the effect of employing different structures of convolutional neural network and propose a fast and portable structure. We demonstrate the effectiveness of the proposed method on a dataset with up to $4,484$ labelled receipts, without any pre-training or post-processing, achieving state of the art performance that is much better than the NER based methods in terms of either speed and accuracy. Experimental results also demonstrate that the proposed CUTIE model being able to achieve good performance with a much smaller amount of training data.

研究の動機と目的

手作業で作成したテンプレートやテンプレートテンプレートに依存せず、多様な文書レイアウトから堅牢なキー情報抽出を実現する動機付け。
文書中のテキストの正確な空間関係と統合されたセマンティックな語彙埋め込みを統合する。
グリッド位置マッピングと多段階の文脈を捉える2つのCNNアーキテクチャを提案する。
限られた訓練データで、事前学習や後処理なしでもCUTIEが高い性能を示すことを実証する。

提案手法

テキストトークンを相対的な空間関係を保持するグリッドへマッピングして、文書のグリッド表現を作成する。
トークンを語彙埋め込みで埋め込み、グリッドをCNNに入力してテキストラベルグリッドを予測する。
2つのCNN変種を提案する：CUTIE-A（高解像度・マルチスケール特徴融合）とCUTIE-B（ASPPを用いたatrous畳み込み）。
訓練には予測されたグリッドとグランドトゥルースのトークングリッド間のクロスエントロピー損失を用いる。
ICDAR 2019 SROIEと自作のスペイン語レシートデータセットを、クラス別およびトークンレベルの指標で評価する。
速度と精度を評価するためにNERのCloudScanとBERTとを比較する。

実験結果

リサーチクエスチョン

RQ1CUTIEは、さまざまな文書レイアウトに対して意味情報と空間テキスト特徴を効果的に融合して、堅牢なキー情報抽出を実現できるか。
RQ2グリッド増強とマルチスケールCNNアーキテクチャは、訓練データが限られている場合に抽出精度を向上させるか。
RQ3CUTIE-AとCUTIE-Bは、SROIEおよび拡張データセットで精度・モデルサイズ・訓練効率の点でどのように比較されるか。

主な発見

方法	パラメータ数	Taxi AP/SoftAP	ME AP/SoftAP	Hotel AP/SoftAP
CloudScan	-	82 / -	64 / -	60 / -
BERT for NER	110M	88.1 / -	80.1 / -	71.7 / -
CUTIE-A	67M	90.8 / 97.2	77.7 / 91.4	69.5 / 87.8
CUTIE-B	14M	94.0 / 97.3	81.5 / 89.7	74.6 / 87.0

CUTIE-Bはタクシー領収書で94.0 APと97.3 softAP、MEで81.5 APと89.7 softAP、ホテル領収書で74.6 APと87.0 softAPを達成。
CUTIE-Aはタクシー領収書で90.8 APと97.2 softAP、MEで77.7 APと91.4 softAP、ホテル領収書で69.5 APと87.8 softAPを達成。
CUTIEモデルは3つの文書タイプすべてでAP/softAPの点でCloudScanおよびBERTのNERよりも優れた性能を示し、CUTIE-Bはパラメータ数がはるかに少ない(14M)にもかかわらずBERT(110M)と同等かそれ以上の精度を達成。
グリッド増強は空間理解を改善し、増強なしより高いAP/softAPをもたらす。
CUTIE-Bは訓練データをわずか21%程度で強力な性能を発揮し、CUTIE-BはBERTの約半分のパラメータ数でベースラインを上回ることができる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。