QUICK REVIEW

[論文レビュー] AttGAN: Facial Attribute Editing by Only Changing What You Want

Zhenliang He, Wangmeng Zuo|arXiv (Cornell University)|Nov 29, 2017

Generative Adversarial Networks and Image Synthesis被引用数 41

ひとこと要約

AttGAN は、潜在表現に制約を課すことなく、生成画像に対して属性分類を課すことで、望ましい属性のみを変更することを保証する、顔の属性編集のための新規フレームワークを提案する。属性分類、再構成損失、対抗学習を組み合わせることで、CelebA データセットにおいて高精細な詳細保持と優れた視覚的品質を達成し、最先端の結果を得た。

ABSTRACT

Facial attribute editing aims to manipulate single or multiple attributes of a face image, i.e., to generate a new face with desired attributes while preserving other details. Recently, generative adversarial net (GAN) and encoder-decoder architecture are usually incorporated to handle this task with promising results. Based on the encoder-decoder architecture, facial attribute editing is achieved by decoding the latent representation of the given face conditioned on the desired attributes. Some existing methods attempt to establish an attribute-independent latent representation for further attribute editing. However, such attribute-independent constraint on the latent representation is excessive because it restricts the capacity of the latent representation and may result in information loss, leading to over-smooth and distorted generation. Instead of imposing constraints on the latent representation, in this work we apply an attribute classification constraint to the generated image to just guarantee the correct change of desired attributes, i.e., to "change what you want". Meanwhile, the reconstruction learning is introduced to preserve attribute-excluding details, in other words, to "only change what you want". Besides, the adversarial learning is employed for visually realistic editing. These three components cooperate with each other forming an effective framework for high quality facial attribute editing, referred as AttGAN. Furthermore, our method is also directly applicable for attribute intensity control and can be naturally extended for attribute style manipulation. Experiments on CelebA dataset show that our method outperforms the state-of-the-arts on realistic attribute editing with facial details well preserved.

研究の動機と目的

既存手法が潜在表現に対して属性に依存しない制約を課えるという限界に対処すること。これは表現能力を制限し、情報損失を引き起こす。
望ましい属性のみを変更し、他のすべての顔の詳細（アイデンティティ、照明、背景）を保持する顔の属性編集手法を開発すること。
属性分類、再構成、対抗学習の3つの補完的学習コンponentを統合することで、編集品質と現実性を向上させること。
属性強度の制御に直接適用可能であり、属性スタイルの操作への自然な拡張を可能とすること。

提案手法

本手法はエンコーダ・デコーダアーキテクチャを用い、エンコーダが顔画像を潜在コードにマッピングし、デコーダが潜在コードと望ましい属性を条件として新しい画像を生成する。
生成画像に属性分類ヘッドを適用し、正しい属性操作を強制することで、'望むものを変更する'ことを保証する。
入力画像と生成画像の間で再構成損失を適用し、属性以外の詳細を保持することで、'望むものだけを変更する'ことを強制する。
対抗学習を用いて生成画像の視覚的リアリズムを向上させ、知覚的品質を向上させる。
属性分類、再構成、対抗学習の3つのコンponentが統合されたフレームワーク内で同時に最適化される。
3つのコンponentをバランスさせる統合損失関数を用いて、エンド・トゥ・エンドでモデルを訓練する。

実験結果

リサーチクエスチョン

RQ1潜在表現に属性独立性を課すことで、表現能力が制限され、属性編集性能が低下するのか？
RQ2潜在表現に制約を課さずに、生成画像上で属性を分類することで、正確な属性編集が達成可能か？
RQ3属性分類、再構成、対抗学習の組み合わせが、編集中の顔のアイデンティティと詳細をどれほど効果的に保持できるか？
RQ4提案手法を属性強度の制御に直接適用可能であり、属性スタイル操作への拡張も可能か？

主な発見

CelebA データセットにおいて、編集精度、視覚的品質、属性以外の詳細の保持の観点で、AttGAN は最先端の手法を上回った。
アブレーションスタディの結果、属性分類、再構成、対抗損失のいずれかを除去すると、性能が著しく低下し、各コンponentの必要性が確認された。
再構成損失はアイデンティティの保持とアーティファクトの最小化に不可欠である。これがないと顔のアイデンティティが著しく変化し、アーティファクトが出現する。
Fader Networks や IcGAN で用いられる属性独立制約は、情報損失を引き起こし、結果を劣化させるため、高品質な編集には不適切であることが示された。
ベースラインと比較して、AttGAN は特に非ターゲット属性の保持において、より高い編集精度と低い編集誤差を達成した。
本手法は属性強度の制御に直接適用可能であり、属性スタイル操作への応用の可能性を示しているが、変動が大きいスタイル（例：絵画）では性能に限界がある。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。