QUICK REVIEW

[論文レビュー] Fourier Contour Embedding for Arbitrary-Shaped Text Detection

Yiqin Zhu, Jianyong Chen|arXiv (Cornell University)|Apr 21, 2021

Handwritten Text Recognition Techniques参考文献 37被引用数 26

ひとこと要約

本論文は、任意の形状の文字輪郭をコンパクトなフーリエ署名として表現する Fourier Contour Embedding (FCE) を提案し、これらの署名を予測し Inverse Fourier Transform で輪郭を再構成する end-to-end の任意形状文字検出を実現する FCENet を構築する。

ABSTRACT

One of the main challenges for arbitrary-shaped text detection is to design a good text instance representation that allows networks to learn diverse text geometry variances. Most of existing methods model text instances in image spatial domain via masks or contour point sequences in the Cartesian or the polar coordinate system. However, the mask representation might lead to expensive post-processing, while the point sequence one may have limited capability to model texts with highly-curved shapes. To tackle these problems, we model text instances in the Fourier domain and propose one novel Fourier Contour Embedding (FCE) method to represent arbitrary shaped text contours as compact signatures. We further construct FCENet with a backbone, feature pyramid networks (FPN) and a simple post-processing with the Inverse Fourier Transformation (IFT) and Non-Maximum Suppression (NMS). Different from previous methods, FCENet first predicts compact Fourier signatures of text instances, and then reconstructs text contours via IFT and NMS during test. Extensive experiments demonstrate that FCE is accurate and robust to fit contours of scene texts even with highly-curved shapes, and also validate the effectiveness and the good generalization of FCENet for arbitrary-shaped text detection. Furthermore, experimental results show that our FCENet is superior to the state-of-the-art (SOTA) methods on CTW1500 and Total-Text, especially on challenging highly-curved text subset.

研究の動機と目的

任意形状文字検出における高度に曲がった文字形状を表現する課題に対処する。
異なる真地上点数を持つデータセットに跨って一般化する、コンパクトで柔軟な輪郭表現を提案する。
推論時にフーリエ署名を予測し輪郭を再構成することで、エンドツーエンドで学習可能な検出を実現する。
曲線文字ベンチマーク CTW1500 および Total-Text で最先端または競争力のある結果を示す。

提案手法

複素値関数と固定数の低周波成分(K)を用いて、フーリエ領域で文字輪郭を表現する。
一意の開始点、時計回りサンプリング、等速で固定N（例: N=400）に輪郭点をリサンプリングして、安定したフーリエ署名を得る。
リサンプリングした輪郭点を離散フーリエ変換を介してフーリエ係数 c_k に埋め込み、コンパクトなフーリエ署名ベクトル [c_{-K}, ..., c_{K}] を形成する。
ResNet50-DCN バックボーンと FPN を用いて FCENet を訓練し、ピクセルごとの Text Region (TR) および Text Center Region (TCR) マスクとフーリエ署名ベクトルを予測する。推論時には逆フーリエ変換（IFT）と NMS で輪郭を再構成する。
損失は、分類（TR および TCR）と、IFT適用後の再構成輪郭と予測輪郭の L1 差を最小化する回帰項を組み合わせる（式6）。
固定リサンプリングにより、CTW1500、Total-Text などの異なるデータセットの受入を容易にし、フーリエ係数を比較可能にする。

実験結果

リサーチクエスチョン

RQ1フーリエドメインの輪郭表現は、過度な後処理を必要とせず、任意形状の文字をコンパクトで柔軟に記述できるのだろうか？
RQ2エンドツーエンドでフーリエ署名を予測することは、非常に曲がった文字に対してIFTを用いた正確な輪郭再構成につながるのか？
RQ3FCENetは曲線文字ベンチマーク（CTW1500、Total-Text）およびマルチオリエンテーションデータセット（ICDAR2015）で、最先端手法とどのように比較されるか？
RQ4Text Center Region 損失や提案された回帰損失などの成分が全体の性能に与える影響はどの程度か？

主な発見

Methods	Paper	Ext.	CTW1500_R	CTW1500_P	CTW1500_F	Total-Text_R	Total-Text_P	Total-Text_F	ICDAR2015_R	ICDAR2015_P	ICDAR2015_F
TextSnake	ECCV’18	surd	85.3	67.9	75.6	74.5	82.7	78.4	80.4	84.9	82.6
SegLink++	PR’19	surd	79.8	82.8	81.3	80.9	82.1	81.5	80.3	83.7	82.0
SAEmbed	CVPR’19	surd	77.8	82.7	80.1	-	-	-	85.0	88.3	86.6
CRAFT	CVPR’19	surd	81.1	86.0	83.5	79.9	87.6	83.6	84.3	89.8	86.9
PAN (no Ext)	ICCV’19	×	77.7	84.6	81.0	79.4	88.0	83.5	77.8	82.9	80.3
PAN (with Ext)	ICCV’19	surd	81.2	86.4	83.7	81.0	89.3	85.0	81.9	84.0	82.9
PSENet	CVPR’19	×	75.6	80.6	78.0	75.1	81.8	78.3	79.7	81.5	80.6
PSENet	CVPR’19	surd	79.7	84.8	82.2	84.0	78.0	80.9	84.5	86.9	85.7
LOMO	CVPR’19	surd	76.5	85.7	80.8	79.3	87.6	83.3	83.5	91.3	87.2
DB	AAA I’20	surd	80.2	86.9	83.4	82.5	87.1	84.7	83.2	91.8	87.3
Boundary	AAA I’20	surd	-	-	-	83.5	85.2	84.3	88.1	82.2	85.0
DRRG	CVPR’20	surd	83.0	85.9	84.5	84.9	86.5	85.7	84.7	88.5	86.6
ContourNet	CVPR’20	×	84.1	83.7	83.9	83.9	86.9	85.4	86.1	87.6	86.9
TextRay	MM’20	surd	80.4	82.8	81.6	77.9	83.5	80.6	-	-	-
ABCNet	CVPR’20	surd	78.5	84.4	81.4	81.3	87.9	84.5	-	-	-
FCENet†	Ours	×	80.7	85.7	83.1	79.8	87.4	83.4	84.2	85.1	84.6
FCENet	Ours	×	83.4	87.6	85.5	82.5	89.3	85.8	82.6	90.1	86.2

FCE は、任意の閉曲線を少数の低周波成分（K=5 でしばしば十分）で近似できる。
FCENet は CTW1500 および Total-Text で最先端手法に対して競争力のある結果を達成し、特に高度に曲がった文字のサブセットで高い性能を示す。
アブレーション実験では、Text Center Region 損失と提案された輪郭ベース回帰損失が結果を大幅に改善し、特に CTW1500 および Total-Text で顕著である。
FCENet はトレーニングデータを削減しても堅牢な性能を維持し、良好な一般化を示す。
FCENet は追加データなしで CTW1500 で 83.4% R、87.6% P、85.5% F、Total-Text で 82.5% R、89.3% P、85.8% F を達成し、強化設定では CTW1500 が 83.4/87.6/85.5、Total-Text が 82.5/89.3/85.8 となる（表1参照）。一方、FCENet†（より大きなバックボーン）は baseline 設定で CTW1500 が 80.7/85.7/83.1、Total-Text が 79.8/87.4/83.4 に達する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。