QUICK REVIEW

[논문 리뷰] CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor

Xiaohui Zhao, Niu, Endi|arXiv (Cornell University)|2019. 03. 29.

Topic Modeling참고 문헌 10인용 수 46

한 줄 요약

CUTIE는 문서의 텍스트를 격자 형태로 배열하는 CNN 기반 접근법으로 시맨틱 및 공간 정보를 공동으로 활용하여 주요 정보 추출을 수행하고, 적은 학습 데이터와 사전 학습 없이도 최첨단 성능을 달성합니다.

ABSTRACT

Extracting key information from documents, such as receipts or invoices, and preserving the interested texts to structured data is crucial in the document-intensive streamline processes of office automation in areas that includes but not limited to accounting, financial, and taxation areas. To avoid designing expert rules for each specific type of document, some published works attempt to tackle the problem by learning a model to explore the semantic context in text sequences based on the Named Entity Recognition (NER) method in the NLP field. In this paper, we propose to harness the effective information from both semantic meaning and spatial distribution of texts in documents. Specifically, our proposed model, Convolutional Universal Text Information Extractor (CUTIE), applies convolutional neural networks on gridded texts where texts are embedded as features with semantical connotations. We further explore the effect of employing different structures of convolutional neural network and propose a fast and portable structure. We demonstrate the effectiveness of the proposed method on a dataset with up to $4,484$ labelled receipts, without any pre-training or post-processing, achieving state of the art performance that is much better than the NER based methods in terms of either speed and accuracy. Experimental results also demonstrate that the proposed CUTIE model being able to achieve good performance with a much smaller amount of training data.

연구 동기 및 목표

손으로 설계된 템플릿이나 템플릿 템플릿에 의존하지 않고 다양한 문서 레이아웃에서 강력한 주요 정보 추출을 촉진한다.
문서 내 텍스트의 정확한 공간 관계와 시맨틱 단어 임베딩을 통합한다.
격자 위치 매핑과 다중 스케일 맥락 및 긴 범위 의존성을 포착하기 위한 두 가지 CNN 아키텍처를 제안한다.
제시 CUTIE가 제한된 학습 데이터와 사전학습이나 후처리 없이도 강력한 성능을 달성함을 보여준다.

제안 방법

문서의 텍스트 토큰을 상대적 공간 관계를 보존하는 격자에 매핑하여 격자 표현을 만든다.
토큰을 단어 임베딩으로 임베드하고 격자를 CNN에 입력하여 텍스트-레이블 격자를 예측한다.
두 가지 CNN 변형을 제안한다: CUTIE-A(고해상도, 다중 스케일 특징 융합)와 CUTIE-B(아트루스(Atrous) 합성곱과 ASPP 사용)
학습을 위해 예측된 격자와 실제 토큰 격자 간의 교차 엔트로피 손실을 사용한다.
ICDAR 2019 SROIE와 자체 구축한 스페인 영수증 데이터셋에서 클래스별 및 토큰 수준 지표로 평가한다.
속도와 정확도를 평가하기 위해 NER용 CloudScan 및 BERT와 비교한다.

실험 결과

연구 질문

RQ1CUTIE가 다양한 문서 레이아웃에서 의미론적 텍스트 특징과 공간 텍스트 특징을 효과적으로 융합하여 강건한 주요 정보 추출을 달성할 수 있는가?
RQ2격자 증강과 다중 스케일 CNN 아키텍처가 제한된 학습 데이터에서 추출 정확도를 향상시키는가?
RQ3CUTIE-A와 CUTIE-B가 SROIE 및 확장 데이터셋에서 정확도, 모델 크기 및 학습 효율성 측면에서 어떻게 비교되는가?

주요 결과

CUTIE-B는 택시 영수증에서 94.0 AP 및 97.3 softAP를 달성하고, ME에서 81.5 AP 및 89.7 softAP, 호텔 영수증에서 74.6 AP 및 87.0 softAP를 달성한다.
CUTIE-A는 택시 영수증에서 90.8 AP 및 97.2 softAP, ME에서 77.7 AP 및 91.4 softAP, 호텔 영수증에서 69.5 AP 및 87.8 softAP를 달성한다.
CUTIE 모델은 세 가지 문서 유형 모두에서 AP/softAP 기준으로 CloudScan과 BERT 기반 NER보다 우수한 성능을 달성하고, CUTIE-B는 BERT(110M)보다 훨씬 적은 매개변수(14M)로 유사하거나 더 나은 정확도를 달성한다.
격자 증강은 공간 이해를 향상시키고 증강이 없을 때보다 더 높은 AP/softAP를 제공한다.
CUTIE-B는 최소한 21%의 학습 데이터로도 강력한 성능을 달성하며, CUTIE-B는 BERT의 약 절반 매개변수로도 베이스라인을 능가한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.