QUICK REVIEW

[论文解读] Style and Pose Control for Image Synthesis of Humans from a Single Monocular View

Kripasindhu Sarkar, Vladislav Golyanik|arXiv (Cornell University)|Feb 22, 2021

Generative Adversarial Networks and Image Synthesis参考文献 69被引用 44

一句话总结

StylePoseGAN 通过单张图像实现 Photo-realistic 的人体图像生成，具备对姿态和每个身体部位外观的显式、解耦控制，达到最先进的保真度并具有多样化应用。

ABSTRACT

Photo-realistic re-rendering of a human from a single image with explicit control over body pose, shape and appearance enables a wide range of applications, such as human appearance transfer, virtual try-on, motion imitation, and novel view synthesis. While significant progress has been made in this direction using learning-based image generation tools, such as GANs, existing approaches yield noticeable artefacts such as blurring of fine details, unrealistic distortions of the body parts and garments as well as severe changes of the textures. We, therefore, propose a new method for synthesising photo-realistic human images with explicit control over pose and part-based appearance, i.e., StylePoseGAN, where we extend a non-controllable generator to accept conditioning of pose and appearance separately. Our network can be trained in a fully supervised way with human images to disentangle pose, appearance and body parts, and it significantly outperforms existing single image re-rendering methods. Our disentangled representation opens up further applications such as garment transfer, motion transfer, virtual try-on, head (identity) swap and appearance interpolation. StylePoseGAN achieves state-of-the-art image generation fidelity on common perceptual metrics compared to the current best-performing methods and convinces in a comprehensive user study.

研究动机与目标

旨在从单张单目视角合成照片级真实的人体图像，并对姿态、形状和外观进行显式控制。
解耦姿态、外观和身体部位，以实现姿态转移、服装转移和身份交换等应用。
利用以空间姿态编码和外观向量为条件的 StyleGAN2 基生成器，实现高保真渲染。
使用成对图像进行监督训练，以重建目标姿态和外观。

提出的方法

将姿态表示为一个空间条件张量输入到基于 StyleGAN2 的生成器 (GNet)。
使用 PNet 将姿态编码为空间张量 E，使用 ANet 将外观编码为潜在向量 z。
通过 DensePose 提取姿态，通过部分 SMPL 纹理图提取外观，构建与姿态无关的外观 A。
使用同一人源-目标成对图像进行训练，以通过重建和对抗损失来实现姿态与外观的解耦（I_s 到 I_t）。
优化总损失，结合重建、感知、脸部身份、GAN 和 patch co-occurrence 项，以实现高保真合成。

实验结果

研究问题

RQ1对姿态和基于部位的外观进行显式条件化，能否提升从单张图像对人体进行照片级逼真再渲染？
RQ2解耦姿态和外观是否能在保留细节纹理的同时，实现可靠的姿态转移、服装转移和身份交换？
RQ3StylePoseGAN 在单图像再次渲染人体的感知指标和用户评估上，与最先进方法相比如何？

主要发现

StylePoseGAN 在姿态转移上达到最先进的 SSIM 和 LPIPS，SSIM 0.788，LPIPS 0.133，优于所有对比方法。
相较于先前最佳方法（NHRR），StylePoseGAN 将 LPIPS 降低约 19%（0.133 对比 0.164）。
在基于 DeepFashion 的评估中，StylePoseGAN 能保留比竞争对手更高频纹理细节和服装纹样。
用户研究显示在身份相似性、真实感和细节保留方面，StylePoseGAN 相较于 CBI 与 NHRR 有更高偏好（在报告的问题中超过 95% 的偏好）。
该模型支持精细纹理控制以及服装转移、头部/身份交换和外观插值等应用。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。