QUICK REVIEW

[論文レビュー] MetaAvatar: Learning Animatable Clothed Human Models from Few Depth Images

Shaofei Wang, Marko Mihajlović|arXiv (Cornell University)|Jun 22, 2021

3D Shape Modeling and Analysis参考文献 71被引用数 48

ひとこと要約

MetaAvatar は、少数の単眼深度画像から動的なニューラル SDF を予測するメタ学習済みハイパーネットワークを学習し、ポーズ依存の布地変形を伴う服を着た人間の高速で制御可能なアバターを実現する。

ABSTRACT

In this paper, we aim to create generalizable and controllable neural signed distance fields (SDFs) that represent clothed humans from monocular depth observations. Recent advances in deep learning, especially neural implicit representations, have enabled human shape reconstruction and controllable avatar generation from different sensor inputs. However, to generate realistic cloth deformations from novel input poses, watertight meshes or dense full-body scans are usually needed as inputs. Furthermore, due to the difficulty of effectively modeling pose-dependent cloth deformations for diverse body shapes and cloth types, existing approaches resort to per-subject/cloth-type optimization from scratch, which is computationally expensive. In contrast, we propose an approach that can quickly generate realistic clothed human avatars, represented as controllable neural SDFs, given only monocular depth images. We achieve this by using meta-learning to learn an initialization of a hypernetwork that predicts the parameters of neural SDFs. The hypernetwork is conditioned on human poses and represents a clothed neural avatar that deforms non-rigidly according to the input poses. Meanwhile, it is meta-learned to effectively incorporate priors of diverse body shapes and cloth types and thus can be much faster to fine-tune, compared to models trained from scratch. We qualitatively and quantitatively show that our approach outperforms state-of-the-art approaches that require complete meshes as inputs while our approach requires only depth frames as inputs and runs orders of magnitudes faster. Furthermore, we demonstrate that our meta-learned hypernetwork is very robust, being the first to generate avatars with realistic dynamic cloth deformations given as few as 8 monocular depth frames.

研究の動機と目的

ポーズ依存の布地変形をサポートする一般化可能な服を着た人間のモデリングを推進する。
体の姿勢に条件付けられた動的なニューラルSDFを生み出すハイパーネットワークを初期化するためのメタ学習フレームワークを開発する。
単眼深度画像からの高速なファインチューニングを可能にして、被写体固有のアニメーション可能なアバターを作成する。
メッシュ入力を用いたベースラインと比較し、非常に限られたデータでの頑健性を示す。

提案手法

標準空間における服を着た人間のニューラルSDFパラメータを予測するために、メタ学習済みハイパーネットワーク g_ψ を使用する。
少数ショットのメタ学習（IGR 損失）を通じて、体形と衣服の多様性にわたる事前知識を捉える静的なニューラルSDF f_φ をメタ学習する。
骨変換 {B_b} で条件付けられたとき、静的メタSDFパラメータに対する残差を予測するハイパーネットワークを訓練する。
正準化入力深度点を逆スキニングネットワークで行い、次に前方スキニングネットワークでアニメーション化する。
テスト時に深度フレームの小さなファインチューニングセットとSMPL適合を用いてハイパーネットワークを最適化する。
g_ψ を階層的MLPエンコーダで骨変換をエンコードする。

実験結果

リサーチクエスチョン

RQ1メタ学習済みの事前知識は、深度画像からアニメーション可能な服を着た人間のアバターを高速・少数ショットで生成するのに役立つか？
RQ2限られた入力データで、ハイパーネットワークベースの動的SDFモデルは被写体・衣服タイプ・姿勢を跨いでどれだけ一般化できるか？
RQ3ポーズ依存の布地変形のための動的ハイパーネットワークをメタ学習することと、静的SDF事前学習をメタ学習することの利点は何か？
RQ4深度フレームのみが利用可能な場合、MetaAvatar はメッシュベースや全入力ベースのベースラインと比較してどの程度性能が良いか？

主な発見

MetaAvatar は、わずか8枚の深度フレームから、約2分のファインチューニングで制御可能な動的ニューラルSDFアバターを生成する。
この手法は、完全なメッシュを必要とするベースライン（NASA、SCANimate など）を上回り、深度のみの入力でも布地の細部をより良く再現する LEAP に近づく。
静的なメタSDFの初期化と、それに続くハイパーネットワークの残差という2段階のメタ学習戦略は、学習を安定させ、迅速な適応を可能にする。
学習された変形事前知識は、CAPE の実験で未知の被写体や姿勢、少なくとも1種の未知の布地にも頑健である。
ファインチューニングデータを<1%へと削減しても、MetaAvatar は知覚品質を競合的に維持し、補間/外挿性能も合理的である。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。