QUICK REVIEW

[論文レビュー] High Resolution Face Editing with Masked GAN Latent Code Optimization

Martin Pernuš, Vitomir Štruc|arXiv (Cornell University)|Mar 20, 2021

Face recognition and analysis被引用数 8

ひとこと要約

MaskFaceGANは、顔検出器と属性分類器を介して空間的および意味的制約を用いて、StyleGAN2の潜在コードを最適化する高解像度顔編集手法を提案する。これにより、1024×1024解像度でアーティファクトのない、写真のようにリアルな編集が実現され、従来のGANベースの手法と比較して属性のエンタングルメントが低減される。

ABSTRACT

Face editing represents a popular research topic within the computer vision and image processing communities. While significant progress has been made recently in this area, existing solutions: (i) are still largely focused on low-resolution images, (ii) often generate editing results with visual artefacts, or (iii) lack fine-grained control and alter multiple (entangled) attributes at once, when trying to generate the desired facial semantics. In this paper, we aim to address these issues though a novel attribute editing approach called MaskFaceGAN that focuses on local attribute editing. The proposed approach is based on an optimization procedure that directly optimizes the latent code of a pre-trained (state-of-the-art) Generative Adversarial Network (i.e., StyleGAN2) with respect to several constraints that ensure: (i) preservation of relevant image content, (ii) generation of the targeted facial attributes, and (iii) spatially--selective treatment of local image areas. The constraints are enforced with the help of an (differentiable) attribute classifier and face parser that provide the necessary reference information for the optimization procedure. MaskFaceGAN is evaluated in extensive experiments on the CelebA-HQ, Helen and SiblingsDB-HQf datasets and in comparison with several state-of-the-art techniques from the literature, i.e., StarGAN, AttGAN, STGAN, and two versions of InterFaceGAN. Our experimental results show that the proposed approach is able to edit face images with respect to several local facial attributes with unprecedented image quality and at high-resolutions (1024x1024), while exhibiting considerably less problems with attribute entanglement than competing solutions. The source code is made freely available from: https://github.com/MartinPernus/MaskFaceGAN.

研究の動機と目的

視覚的アーティファクト、低解像度、属性のエンタングルメントといった、従来のGANベースの顔編集手法の限界を解消すること。
髪の色、メイク、顔貌構造などの顔貌属性を高解像度で細かく局所的に編集できること。
制約付き潜在空間最適化により、特定の顔貌属性を変更しながらも、全体の画像構造とアイデンティティを保持すること。
知覚的忠実度が高く、意味的ずれが最小限に抑えられるように、局所的およびグローバルな属性編集を両立できる手法を提供すること。

提案手法

事前学習済みのStyleGAN2ジェネレータの潜在コードを勾配ベースの最適化により最適化する。
微分可能属性分類器を用いて意味的制約を強制し、目的の属性が存在・非存在することを保証する。
事前学習済みの顔検出器を用いて空間的制約を適用し、領域別に編集行動を定義する（例：眉や唇のみを編集）。
顔の領域の和集合に基づくブレンド戦略を用いて、最適化中に元の画像コンテンツを保持する。
入力画像との知覚的類似性を維持するために、LPIPS損失と多層特徴マッチングを統合する。
属性分類、空間的パーサー、知覚的再構成の複数の損失を組み合わせた多目的損失を用いて、強固な最適化を実現する。

実験結果

リサーチクエスチョン

RQ1StyleGAN2における潜在コード最適化は、1024×1024解像度で最小限の視覚的アーティファクトをもって顔編集を達成できるか？
RQ2顔検出器からの空間的制約は、局所的編集における属性エンタングルメントをどの程度低減できるか？
RQ3微分可能な属性分類器の統合により、目的の属性の意味的制御はどの程度向上するか？
RQ4提案手法は、既存のGANインバージョンベースの手法と比較して、アイデンティティおよび背景の詳細をよりよく保持できるか？
RQ5本手法は、一貫性のある知覚的品質を維持しながら、局所的およびグローバルな属性編集を両立できるか？

主な発見

MaskFaceGANは、最先端の手法と比較して視覚的に優れており、1024×1024解像度でアーティファクトのない編集結果を生成する。
特に眉、髪の色、口紅といった局所的属性に関して、属性エンタングルメントが顕著に低減される。
ユーザースタディーでは、知覚的品質と属性制御の両面で競合他手法を上回るが、「細い目」の編集では目を閉じる傾向があるため、目を細くする編集には不適切である。
InterFaceGANなどの類似手法と比較して、最適化プロセスがより速く収束し、1枚あたりのステップ数が少ない。
「若々しい」や「男性」などのグローバル属性に対しても、入力画像の顔貌および背景との対応関係が強く保たれる。
属性分類器や顔検出器が誤った予測を行う場合、特定のケースで意図しない編集が生じるという限界が生じる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。