QUICK REVIEW

[論文レビュー] Scene Text Synthesis for Efficient and Effective Deep Network Training

Changgong Zhang, Fangneng Zhan|arXiv (Cornell University)|Jan 26, 2019

Handwritten Text Recognition Techniques参考文献 51被引用数 33

ひとこと要約

本研究は、前景–背景埋め込み技術を用いてシーンテキストの注釈付き訓練用画像を合成し、2つの要素—文脈依存の意味的整合性と調和の取れた外観適応—を持ち、シーンテキスト検出と認識の評価において、実画像と同等またはそれ以上の性能を達成する。

ABSTRACT

A large amount of annotated training images is critical for training accurate and robust deep network models but the collection of a large amount of annotated training images is often time-consuming and costly. Image synthesis alleviates this constraint by generating annotated training images automatically by machines which has attracted increasing interest in the recent deep learning research. We develop an innovative image synthesis technique that composes annotated training images by realistically embedding foreground objects of interest (OOI) into background images. The proposed technique consists of two key components that in principle boost the usefulness of the synthesized images in deep network training. The first is context-aware semantic coherence which ensures that the OOI are placed around semantically coherent regions within the background image. The second is harmonious appearance adaptation which ensures that the embedded OOI are agreeable to the surrounding background from both geometry alignment and appearance realism. The proposed technique has been evaluated over two related but very different computer vision challenges, namely, scene text detection and scene text recognition. Experiments over a number of public datasets demonstrate the effectiveness of our proposed image synthesis technique - the use of our synthesized images in deep network training is capable of achieving similar or even better scene text detection and scene text recognition performance as compared with using real images.

研究の動機と目的

注釈付きの合成画像を生成することにより、深層ネットワークの訓練にかかる注釈コストを削減する。
意味的に整合した文脈に前景オブジェクトを配置する合成パイプラインを開発する。
訓練に有用な合成データを高めるために、幾何学的および外観のリアリズムを保証する。
実画像訓練と比較するため、シーンテキスト検出および認識のベンチマークで技術を評価する。

提案手法

意味的整合性を保ちながら、関心のある前景オブジェクトを背景画像に埋め込む。
背景の意味的に意味のある領域にOOIが整列するよう、文脈を意識した配置を適用する。
OOIと背景との間で幾何学的整合と外観のリアリズムを達成するために、調和の取れた外観適応を適用する。
深層ネットワークの訓練に適した注釈付き合成訓練画像を作成する。
下流のシーンテキスト検出および認識タスクに対する合成技術の影響を評価する。

実験結果

リサーチクエスチョン

RQ1提案手法で訓練された合成画像は、実画像と同等のテキスト検出および認識性能を達成できるか？
RQ2文脈を考慮した意味的整合性は、シーンテキストタスクの訓練効果を高めるか？
RQ3調和の取れた外観適応は、埋め込みオブジェクトの現実感と深層学習への有用性を高めるか？
RQ4頑健なシーンテキストモデルの訓練において、合成画像は実画像とどのように比較されるか？

主な発見

提案手法を用いた合成画像は、深層ネットワークの訓練に有効である。
実画像を使用する場合と比較して、シーンテキスト検出および認識において同等またはそれ以上の性能を達成する。
文脈認識的整合性と外観適応は、合成データの訓練上の有用性に寄与する。
実験は、モデルの頑健性を向上させる現実的な前景埋め込みの価値を検証する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。