QUICK REVIEW

[論文レビュー] MaskGAN: Towards Diverse and Interactive Facial Image Manipulation

Cheng‐Han Lee, Ziwei Liu|arXiv (Cornell University)|Jul 27, 2019

Face recognition and analysis参考文献 47被引用数 83

ひとこと要約

MaskGANは、意味マスクを中間表現として用いることで多様で対話的な顔画像操作を実現し、Dense Mapping NetworkとEditing Behavior Simulated Training、さらに新しいデータセット CelebAMask-HQ を導入します。

ABSTRACT

Facial image manipulation has achieved great progress in recent years. However, previous methods either operate on a predefined set of face attributes or leave users little freedom to interactively manipulate images. To overcome these drawbacks, we propose a novel framework termed MaskGAN, enabling diverse and interactive face manipulation. Our key insight is that semantic masks serve as a suitable intermediate representation for flexible face manipulation with fidelity preservation. MaskGAN has two main components: 1) Dense Mapping Network (DMN) and 2) Editing Behavior Simulated Training (EBST). Specifically, DMN learns style mapping between a free-form user modified mask and a target image, enabling diverse generation results. EBST models the user editing behavior on the source mask, making the overall framework more robust to various manipulated inputs. Specifically, it introduces dual-editing consistency as the auxiliary supervision signal. To facilitate extensive studies, we construct a large-scale high-resolution face dataset with fine-grained mask annotations named CelebAMask-HQ. MaskGAN is comprehensively evaluated on two challenging tasks: attribute transfer and style copy, demonstrating superior performance over other state-of-the-art methods. The code, models, and dataset are available at https://github.com/switchablenorms/CelebAMask-HQ.

研究の動機と目的

意味マスクを操作媒体として用い、顔の多様で対話的な操作を可能にする。
ターゲット画像とマスクからユーザーが変更したマスクへのロバストなスタイルマッピングを学習する。
推論時のマスク変化に対する頑健性を高めるために、ユーザー編集行動をモデル化する。
顔の編集研究のための大規模で高解像度のマスク注釈データセットを提供する。

提案手法

Dense Mapping Network (DMN) と、ターゲット画像とマスクから生成出力へ空間認識スタイルを転送する AdaIN を用いた Spatial-Aware Style Encoder を組み合わせた。
MaskVAE は顔の構造事前分布の多様体をモデル化し、滑らかなマスク補間を可能にする。
Alpha Blender は複数の編集済みマスク間で操作の一貫性を維持するためのアルファブレンディングを学習する。
Editing Behavior Simulated Training (EBST) は inter/out マスクを作成し、Dual-editing 一貫性のために DMN と Blender を最適化することでユーザーの編集を模擬する。
現実味と忠実性を確保するため、対立的損失、特徴マッチング損失、知覚損失を組み合わせた多目的学習。

実験結果

リサーチクエスチョン

RQ1意味マスクは、アイデンティティを保持しつつ多様な顔の操作のための柔軟な中間表現として機能し得るか？
RQ2ターゲット画像とユーザーが変更したマスク間で頑健なスタイル転送を学習し、対話的編集を支援するにはどうすればよいか？
RQ3訓練時にユーザー編集行動を模擬することは、推論時のマスク変化に対する頑健性を高めるか？
RQ4提案された CelebAMask-HQ データセットが高解像度のマスクベース顔編集研究に与える影響は何か？

主な発見

MaskGANは、ベースラインと比較して競争力のあるまたは優れたセグメンテーションと属性保持を伴う、もっともらしい属性転送とスタイルコピーを達成する。
Spatial-Aware Style Encoder は、ターゲットマスク構造を条件づけることでスタイル転送を向上させ、ユーザーが変更したマスクからのバイアスを低減する。
EBST はマスクの変化への頑健性を高め、対話的編集時のアイデンティティ保持を強化する。
MaskGAN は高解像度（512x512）の顔編集タスクで高い性能を示し、CelebAMask-HQ データセットの恩恵を受ける。
Editing Behavior Simulation と dual-editing 一貫性損失は、対話的な入力下でのマスクから画像への操作をより信頼性の高いものに寄与する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。