QUICK REVIEW

[论文解读] High Resolution Face Completion with Multiple Controllable Attributes via Fully End-to-End Progressive Generative Adversarial Networks

Zeyuan Chen, Shaoliang Nie|arXiv (Cornell University)|Jan 23, 2018

Advanced Image Processing Techniques被引用 30

一句话总结

本文提出了一种用于高分辨率人脸修复的完全端到端渐进式 GAN，通过从低分辨率到高分辨率的训练过程，同时利用条件向量实现属性控制。该方法在单次前向传播中即可实现 1024×1024 分辨率下清晰、逼真的生成结果，平均推理时间仅为 0.007 秒。

ABSTRACT

We present a deep learning approach for high resolution face completion with multiple controllable attributes (e.g., male and smiling) under arbitrary masks. Face completion entails understanding both structural meaningfulness and appearance consistency locally and globally to fill in "holes" whose content do not appear elsewhere in an input image. It is a challenging task with the difficulty level increasing significantly with respect to high resolution, the complexity of "holes" and the controllable attributes of filled-in fragments. Our system addresses the challenges by learning a fully end-to-end framework that trains generative adversarial networks (GANs) progressively from low resolution to high resolution with conditional vectors encoding controllable attributes. We design novel network architectures to exploit information across multiple scales effectively and efficiently. We introduce new loss functions encouraging sharp completion. We show that our system can complete faces with large structural and appearance variations using a single feed-forward pass of computation with mean inference time of 0.007 seconds for images at 1024 x 1024 resolution. We also perform a pilot human study that shows our approach outperforms state-of-the-art face completion methods in terms of rank analysis. The code will be released upon publication.

研究动机与目标

解决高分辨率人脸修复中结构与外观一致性的挑战。
在生成的人脸区域中实现对多种属性（如性别、表情）的控制。
通过设计完全端到端的框架，消除后处理和迭代推理的需要。
克服先前方法在大范围遮挡、低分辨率或缺乏属性控制方面的局限性。

提出的方法

训练一种从低到高分辨率逐步生长的渐进式 GAN 架构，学习从粗到细的人脸结构。
将条件向量整合到生成器中，以在生成过程中显式控制属性，如“男性”或“微笑”。
设计多尺度判别器和修复网络，以利用跨尺度特征提升真实感与细节表现。
引入一种新型损失函数，强调清晰度与感知质量，以增强纹理保真度。
采用完全端到端的训练范式，无需任何后处理步骤，支持单次推理。
利用条件噪声注入和跳跃连接，以保持生成人脸的身份特征与对称性。

实验结果

研究问题

RQ1渐进式 GAN 框架是否能够在 1024×1024 分辨率下实现兼具结构与外观一致性的高分辨率人脸修复？
RQ2在不损害真实感的前提下，能否有效将属性控制（如性别、表情）集成到人脸修复 GAN 中？
RQ3完全端到端的单次推理方法是否优于需要后处理或迭代优化的方法？
RQ4当遮挡区域过大或过于复杂，且上下文或外部数据集中均无相似补丁时，模型表现如何？

主要发现

所提方法在 1024×1024 人脸图像上的平均推理时间仅为 0.007 秒，支持实时修复。
一项包含 32 名参与者的初步用户研究显示，该方法在真实感方面显著优于 CE 和 GL 基线（p < 0.001）。
与先前最先进方法相比，该模型生成的图像更清晰，细节更丰富（如面部纹理、皱纹）。
系统可通过条件向量有效控制“男性”和“微笑”等属性，生成一致且合理的属性特异性输出。
尽管性能优异，模型偶尔仍难以捕捉低层次皮肤纹理（如皱纹、汗孔），可能产生不对称特征（如双眼颜色不匹配）。
在感知质量与清晰度方面，该模型优于 GL 和 CE，但部分用户在特定情况下认为 CE 生成的模糊结果更具吸引力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。