QUICK REVIEW

[논문 리뷰] Style and Pose Control for Image Synthesis of Humans from a Single Monocular View

Kripasindhu Sarkar, Vladislav Golyanik|arXiv (Cornell University)|2021. 02. 22.

Generative Adversarial Networks and Image Synthesis참고 문헌 69인용 수 44

한 줄 요약

StylePoseGAN은 한 장의 이미지에서 포즈와 신체 부위별 외관에 명시적이고 해방된 제어를 제공하여 사진실사 수준의 인간 이미지 생성을 가능하게 하며, 최첨단 충실도와 다재다능한 응용을 달성합니다.

ABSTRACT

Photo-realistic re-rendering of a human from a single image with explicit control over body pose, shape and appearance enables a wide range of applications, such as human appearance transfer, virtual try-on, motion imitation, and novel view synthesis. While significant progress has been made in this direction using learning-based image generation tools, such as GANs, existing approaches yield noticeable artefacts such as blurring of fine details, unrealistic distortions of the body parts and garments as well as severe changes of the textures. We, therefore, propose a new method for synthesising photo-realistic human images with explicit control over pose and part-based appearance, i.e., StylePoseGAN, where we extend a non-controllable generator to accept conditioning of pose and appearance separately. Our network can be trained in a fully supervised way with human images to disentangle pose, appearance and body parts, and it significantly outperforms existing single image re-rendering methods. Our disentangled representation opens up further applications such as garment transfer, motion transfer, virtual try-on, head (identity) swap and appearance interpolation. StylePoseGAN achieves state-of-the-art image generation fidelity on common perceptual metrics compared to the current best-performing methods and convinces in a comprehensive user study.

연구 동기 및 목표

Aim to synthesize photo-realistic human images from a single monocular view with explicit control over pose, shape, and appearance.
Disentangle pose, appearance, and body parts to enable applications like pose transfer, garment transfer, and identity swap.
Leverage a StyleGAN2-based generator conditioned by spatial pose encodings and appearance vectors for high-fidelity rendering.
Train in a supervised manner using paired images to reconstruct target poses and appearances.

제안 방법

Represent pose as a spatially-conditioned tensor input to a StyleGAN2-based generator (GNet).
Encode pose with a PNet into a spatial tensor E and appearance with an ANet into a latent vector z.
Extract pose via DensePose and appearance via a partial SMPL texture map to create a pose-independent appearance A.
Train with paired source-target images of the same person to disentangle pose and appearance (I_s to I_t) using reconstruction and adversarial losses.
Optimize a total loss combining reconstruction, perceptual, face identity, GAN, and patch co-occurrence terms for high-fidelity synthesis.

실험 결과

연구 질문

RQ1Can explicit conditioning on pose and part-based appearance improve photo-realistic re-rendering of humans from a single image?
RQ2Does disentangling pose and appearance enable reliable pose transfer, garment transfer, and identity swapping while preserving fine textures?
RQ3How does StylePoseGAN compare to state-of-the-art methods on perceptual metrics and user judgments for single-image re-rendering of humans?

주요 결과

StylePoseGAN achieves state-of-the-art SSIM and LPIPS on pose transfer, with SSIM 0.788 and LPIPS 0.133, outperforming all compared methods.
Compared to the best previous method (NHRR), StylePoseGAN reduces LPIPS by about 19% (0.133 vs 0.164).
On the DeepFashion-based evaluation, StylePoseGAN preserves high-frequency texture details and garment patterns better than competitors.
A user study shows StylePoseGAN is preferred over CBI and NHRR in identity similarity, realism, and fine-detail preservation (95+% preferences in reported questions).
The model supports fine-grained texture control and applications such as garment transfer, head/identity swap, and appearance interpolation.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.