QUICK REVIEW

[论文解读] Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression

Aaron S. Jackson, Adrian Bulat|Research Repository (Kingston University London)|Mar 22, 2017

Face recognition and analysis参考文献 22被引用 44

一句话总结

本文提出一种从单张2D图像直接进行3D人脸重建的方法，采用体素卷积神经网络（volumetric CNN）端到端回归3D人脸几何体，绕过3D形态可变形模型（3DMM）拟合。通过利用空间对齐的体素回归与集成的3D关键点引导，该方法在各种姿态和表情下均实现了最先进性能。

ABSTRACT

3D face reconstruction is a fundamental Computer Vision problem of extraordinary difficulty. Current systems often assume the availability of multiple facial images (sometimes from the same subject) as input, and must address a number of methodological challenges such as establishing dense correspondences across large facial poses, expressions, and non-uniform illumination. In general these methods require complex and inefficient pipelines for model building and fitting. In this work, we propose to address many of these limitations by training a Convolutional Neural Network (CNN) on an appropriate dataset consisting of 2D images and 3D facial models or scans. Our CNN works with just a single 2D facial image, does not require accurate alignment nor establishes dense correspondence between images, works for arbitrary facial poses and expressions, and can be used to reconstruct the whole 3D facial geometry (including the non-visible parts of the face) bypassing the construction (during training) and fitting (during testing) of a 3D Morphable Model. We achieve this via a simple CNN architecture that performs direct regression of a volumetric representation of the 3D facial geometry from a single 2D image. We also demonstrate how the related task of facial landmark localization can be incorporated into the proposed framework and help improve reconstruction quality, especially for the cases of large poses and facial expressions. Testing code will be made available online, along with pre-trained models http://aaronsplace.co.uk/papers/jackson2017recon

研究动机与目标

解决现有3D人脸重建方法依赖复杂流水线、3DMM拟合和密集对应估计的局限性。
实现在无需精确对齐、3DMM构建或迭代优化的情况下，实现单图像3D人脸重建。
通过端到端深度学习方法，在任意人脸姿态、表情和遮挡条件下实现鲁棒的重建。
将3D人脸关键点定位集成到框架中，以提升重建质量，尤其是在挑战性条件下的表现。
在受控和非受控行为的网络图像上均展示出优于最先进方法的性能。

提出的方法

训练一个3D体素卷积神经网络，直接从单张2D图像回归出3D人脸几何体，使用包含配对2D图像与3D人脸扫描的数据集。
采用空间对齐的体素表示，其中3D体素在与输入图像对齐的固定3D坐标系中进行回归。
提出一种引导变体（VRN-Guided），通过高斯热图将3D关键点预测作为监督信号引入，以提升空间一致性。
使用预测体素与真实体素之间的回归损失，以端到端方式训练网络。
应用数据增强和归一化技术，以提升在姿态、表情和光照变化下的泛化能力。
采用简单、浅层的CNN架构，以实现在测试阶段无需复杂优化循环的高效训练与推理。

实验结果

研究问题

RQ1CNN是否能够在不依赖3DMM或迭代拟合的情况下，直接从单张2D图像回归出3D人脸几何体？
RQ2在回归过程中对3D体素进行空间对齐是否能提升重建精度，尤其是在大姿态情况下？
RQ3在极端姿态和表情下，集成3D关键点监督在多大程度上能提升重建质量？
RQ4在受控和非受控行为数据上，该方法与最先进3D人脸重建技术相比性能如何？
RQ5网络设计选择（如关键点引导和高斯核大小）对重建鲁棒性和精度有何影响？

主要发现

所提方法在三个基准数据集上均达到最先进性能，显著优于以往单图像3D人脸重建方法。
VRN-Guided模型相比非引导基线方法以及3DDFA和EOS等现有SOTA方法，大幅降低了平均3D重建误差。
随着偏航角（yaw angle）增大，性能略有下降，但即使在极端姿态下误差仍保持较低水平。
面部表情对重建误差影响极小，表明即使在极端表情且训练数据有限的情况下，方法仍具鲁棒性。
使用更大的高斯热图（σ=2）进行关键点引导仅导致性能轻微下降，证实只要热图尺寸合理，引导机制即有效。
移除空间对齐（即回归固定前视体素）会导致重建结果差且几乎完全相同，证实空间对齐对准确重建至关重要。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。