QUICK REVIEW

[論文レビュー] TAC-GAN - Text Conditioned Auxiliary Classifier Generative Adversarial Network

Ayushman Dash, John Cristian Borges Gamboa|arXiv (Cornell University)|Mar 19, 2017

Generative Adversarial Networks and Image Synthesis参考文献 16被引用数 111

ひとこと要約

TAC-GAN はテキスト埋め込みとDiscriminatorの補助分類器に基づいてテキスト記述から画像を生成し、従来のテキスト→画像モデルより識別性と多様性を高めている。

ABSTRACT

In this work, we present the Text Conditioned Auxiliary Classifier Generative Adversarial Network, (TAC-GAN) a text to image Generative Adversarial Network (GAN) for synthesizing images from their text descriptions. Former approaches have tried to condition the generative process on the textual data; but allying it to the usage of class information, known to diversify the generated samples and improve their structural coherence, has not been explored. We trained the presented TAC-GAN model on the Oxford-102 dataset of flowers, and evaluated the discriminability of the generated images with Inception-Score, as well as their diversity using the Multi-Scale Structural Similarity Index (MS-SSIM). Our approach outperforms the state-of-the-art models, i.e., its inception score is 3.45, corresponding to a relative increase of 7.8% compared to the recently introduced StackGan. A comparison of the mean MS-SSIM scores of the training and generated samples per class shows that our approach is able to generate highly diverse images with an average MS-SSIM of 0.14 over all generated classes.

研究の動機と目的

テキスト記述から多様かつ識別可能な画像を生成する動機づけ。
補助分類器を介してGANフレームワークにテキスト埋め込みを組み込み、構造と内容の一貫性を改善する。
Oxford-102 花で Inception Score と MS-SSIM を用いて合成品質と多様性を評価する。
テキストとスタイル/内容の分離による補間を実証し、制御可能な生成を示す。

提案手法

クラスラベルの代わりにテキスト埋め込み（Skip-Thought）で生成器を条件付けることで AC-GAN を拡張する。
テキストをテキスト埋め込み Ψ(t) として表現し、lg=LG(Ψ(t)) という潜在的テキスト表現を学習し、それを雑音ベクトル z と連結する。
生成器 G は zc = [lg; z] から 128x128x3 の画像を転置畳込みで出力する。
識別器 D は実画像・偽画像・誤った画像の三つ組と、それに対応するテキスト埋め込みおよびクラスラベルを受け取り、DS（実/偽）と DC（クラス）出力を生成する。
識別器には LDS および LCD 損失を、生成器には LGS および LGC 損失を適用し、現実的で正しくラベル付けされた出力を奨励する。
オプションとして DL_Y という新しい識別器出力と対応する損失を追加することで、追加情報の統合を拡張することができる。

実験結果

リサーチクエスチョン

RQ1TAC-GAN はテキストの記述に忠実で識別可能な画像を生成できるか。
RQ2補助分類器を伴うテキスト埋め込みの条件付けは、従来のテキスト→画像手法と比べて画像品質と多様性を改善するか。
RQ3Inception Score や多様性指標の観点で TAC-GAN は StackGAN や他のベースラインとどう比較されるか。
RQ4テキストとスタイルの補間によって、一貫した生成画像の変 variations を生み出せるか。

主な発見

モデル	Inception Score
TAC-GAN	3.45±0.05
StackGan	3.20±0.01
GAN-INT-CLS	2.66±0.03

TAC-GAN の Inception Score は 3.45±0.05、StackGAN の 3.20±0.01、GAN-INT-CLS の 2.66±0.03 を上回る。
TAC-GAN は多様なサンプルを達成し、すべての生成クラスの平均 MS-SSIM は 0.13±0.016、訓練データの平均 0.14±0.019 に近く、いくつかのベースラインより多様性が高い。
モデルはコンテンツ/スタイルの分離を実証しており、異なるノイズベクトルとテキスト埋め込み間での内容を保つ補間で証明される。
平均 MS-SSIM の比較は、生成サンプルが訓練データ全体よりも多様であることを示し、多様性の主張を支持する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。