QUICK REVIEW

[论文解读] Generating 3D faces using Convolutional Mesh Autoencoders

Anurag Ranjan, Timo Bolkart|arXiv (Cornell University)|Jul 26, 2018

Face recognition and analysis参考文献 40被引用 27

一句话总结

该论文提出卷积网格自编码器（CoMA），一种基于谱卷积的非线性3D人脸表示方法，通过分层采样技术对网格上的谱卷积进行建模，以捕捉多尺度的形状与表情变化。在包含20,466个高分辨率人脸网格（含极端表情）的数据集上进行训练，CoMA在参数量减少75%的同时，相比最先进PCA基模型，重建误差降低50%，并在替换FLAME模型中的表情空间时进一步提升了重建精度。

ABSTRACT

Learned 3D representations of human faces are useful for computer vision problems such as 3D face tracking and reconstruction from images, as well as graphics applications such as character generation and animation. Traditional models learn a latent representation of a face using linear subspaces or higher-order tensor generalizations. Due to this linearity, they can not capture extreme deformations and non-linear expressions. To address this, we introduce a versatile model that learns a non-linear representation of a face using spectral convolutions on a mesh surface. We introduce mesh sampling operations that enable a hierarchical mesh representation that captures non-linear variations in shape and expression at multiple scales within the model. In a variational setting, our model samples diverse realistic 3D faces from a multivariate Gaussian distribution. Our training data consists of 20,466 meshes of extreme expressions captured over 12 different subjects. Despite limited training data, our trained model outperforms state-of-the-art face models with 50% lower reconstruction error, while using 75% fewer parameters. We also show that, replacing the expression space of an existing state-of-the-art face model with our autoencoder, achieves a lower reconstruction error. Our data, model and code are available at http://github.com/anuragranj/coma

研究动机与目标

为解决线性模型在捕捉非线性面部形变（尤其是极端表情）方面的局限性。
开发一种深度学习驱动的3D人脸表示方法，具备良好的泛化能力，适用于未见表情，且内存效率高。
通过新颖的网格采样与卷积操作，实现对人脸形状与表情的分层、多尺度建模。
构建一个紧凑、可训练的模型，通过变分采样生成多样化、逼真的3D人脸。
公开一个大规模数据集，包含20,466个高分辨率3D人脸网格，涵盖极端表情，供研究使用。

提出的方法

基于网格拉普拉斯矩阵的快速切比雪夫滤波器，实现非欧几里得表面上的局部化、参数高效的卷积操作。
提出新颖的网格下采样与上采样操作，确保跨尺度的拓扑结构保持不变。
采用变分自编码器框架，结合多变量高斯先验，实现从潜在空间生成多样化3D人脸。
应用拉普拉斯-贝尔特拉米算子进行谱分解，实现频域卷积，从而在高分辨率网格上实现内存高效的卷积操作。
在12名受试者执行12种复杂、非对称表情（伴有显著软组织形变）的数据集上端到端训练模型。
在网格表面共享卷积滤波器，以减少参数量，同时保持局部不变性。

实验结果

研究问题

RQ1非线性深度学习模型是否能在重建3D人脸形状方面，尤其是在极端表情下，优于线性PCA基模型？
RQ2结合新颖采样操作的分层网格卷积是否能有效捕捉多尺度的人脸形状与表情变化？
RQ3在数据量有限的情况下，一个紧凑且参数高效的模型是否能比现有最先进模型更好地泛化到未见表情？
RQ4学习到的潜在空间是否可通过变分采样生成多样化、逼真的3D人脸网格？
RQ5将最先进的模型（如FLAME）中的表情空间替换为CoMA，能在多大程度上提升重建精度？

主要发现

在插值任务中，即使训练数据有限，CoMA的重建误差也比PCA基模型降低50%。
与线性PCA模型相比，CoMA的参数量减少了75%，同时实现了更优的重建性能。
将FLAME模型中的表情空间替换为CoMA后，所有测试潜在维度大小下的中位重建误差均降低，其中在12个潜在维度时改善最大（0.139 mm vs. 0.172 mm）。
在变分设置下，CoMA成功地从潜在空间的标准高斯分布中采样出多样化、逼真的3D人脸网格。
该模型对未见人脸表情具有良好的泛化能力，在捕捉非线性形变方面优于线性模型。
公开了包含20,466个高分辨率3D人脸网格（含极端表情）的数据集，附带代码与训练好的模型，供研究使用。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。