QUICK REVIEW

[論文レビュー] Style and Pose Control for Image Synthesis of Humans from a Single Monocular View

Kripasindhu Sarkar, Vladislav Golyanik|arXiv (Cornell University)|Feb 22, 2021

Generative Adversarial Networks and Image Synthesis参考文献 69被引用数 44

ひとこと要約

StylePoseGANは、単一の画像からポーズと体の部位ごとの外観に対して明示的で分離可能な制御を行い、写真実像に近い人間の画像生成を可能にし、最先端の忠実度と多様な応用を実現します。

ABSTRACT

Photo-realistic re-rendering of a human from a single image with explicit control over body pose, shape and appearance enables a wide range of applications, such as human appearance transfer, virtual try-on, motion imitation, and novel view synthesis. While significant progress has been made in this direction using learning-based image generation tools, such as GANs, existing approaches yield noticeable artefacts such as blurring of fine details, unrealistic distortions of the body parts and garments as well as severe changes of the textures. We, therefore, propose a new method for synthesising photo-realistic human images with explicit control over pose and part-based appearance, i.e., StylePoseGAN, where we extend a non-controllable generator to accept conditioning of pose and appearance separately. Our network can be trained in a fully supervised way with human images to disentangle pose, appearance and body parts, and it significantly outperforms existing single image re-rendering methods. Our disentangled representation opens up further applications such as garment transfer, motion transfer, virtual try-on, head (identity) swap and appearance interpolation. StylePoseGAN achieves state-of-the-art image generation fidelity on common perceptual metrics compared to the current best-performing methods and convinces in a comprehensive user study.

研究の動機と目的

単一のモノクロビュー／単一視点から写真のようにリアルな人間の画像を合成することを目指し、ポーズ・形状・外観を明示的に制御可能とする。
ポーズ・外観・体の部位を分離して、ポーズ転送・衣服転送・アイデンティティ交換といった応用を可能にする。
空間的ポーズエンコーディングと外観ベクトルで条件付けされた StyleGAN2 ベースのジェネレータを活用して高忠実度のレンダリングを実現する。
ペア画像を用いた監視学習で、ターゲットのポーズと外観を再構成するよう訓練する。

提案手法

ポーズを StyleGAN2 ベースのジェネレータ（GNet）への空間条件付きテンソル入力として表現する。
PNet でポーズを空間テンソル E に、ANet で外観を潜在ベクトル z にエンコードする。
DensePose によるポーズ抽出と部分的な SMPL テクスチャマップによる外観抽出で、ポーズに依存しない外観 A を作成する。
同一人物のソース-ターゲット画像のペアで、再構成損失と adversarial 損失を用いてポーズと外観を分離する（I_s から I_t）訓練を行う。
再構成・知覚・顔アイデンティティ・GAN・パッチ共発生などの項を組み合わせた総合損失を最適化し、高忠実度の合成を実現する。

実験結果

リサーチクエスチョン

RQ1ポーズと部位ベースの外観に対する明示的条件付けは、単一画像からの人間のフォトリアリスティックな再レンダリングを改善するか？
RQ2ポーズと外観を分離することで、微細なテクスチャを保ったまま信頼できるポーズ転送・衣服転送・アイデンティティ交換を実現できるか？
RQ3単一画像の人間再レンダリングにおける知覚指標とユーザー判断で、StylePoseGANは最先端手法とどう比較されるか？

主な発見

手法	SSIM ↑	LPIPS ↓
DPT (Neverova et al., 2018)	0.759	0.206
VUnet (Esser et al., 2018)	0.739	0.202
DSC (Siarohin et al., 2019)	0.750	0.214
CBI (Grigor’ev et al., 2019)	0.766	0.178
NHRR (Sarkar et al., 2020)	0.768	0.164
StylePoseGAN (ours)	0.788	0.133
GT	1.0	0.0

StylePoseGAN はポーズ転送で最先端の SSIM と LPIPS を達成し、SSIM 0.788、LPIPS 0.133 で、比較対象のすべてを上回る。
従来手法の中で最高だった NHRR と比較して、LPIPS を約19％削減（0.133 対 0.164）した。
DeepFashion ベースの評価では、StylePoseGAN は競合より高周波 texture のディテールと衣服パターンをより良く保持する。
ユーザ調査では、StylePoseGAN がアイデンティティ類似性・リアリズム・微細ディテールの保持で CBI および NHRR より好まれる（報告質問で 95％超の評価）。
本モデルは細粒度のテクスチャ制御と衣服転送、頭部/アイデンティティ交換、外観補間などの応用をサポートする。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。