Skip to main content
QUICK REVIEW

[论文解读] Fine-Grained 3D Facial Reconstruction for Micro-Expressions

Che Sun, Xinjie Zhang|arXiv (Cornell University)|Mar 7, 2026
Face recognition and analysis被引用 0
一句话总结

提出一种粗到细的3D面部重建方法,通过将全局动态特征与局部丰富的多模态线索相结合,在单目高帧率视频中对微表情 Refin 3D 几何结构。

ABSTRACT

Recent advances in 3D facial expression reconstruction have demonstrated remarkable performance in capturing macro-expressions, yet the reconstruction of micro-expressions remains unexplored. This novel task is particularly challenging due to the subtle, transient, and low-intensity nature of micro-expressions, which complicate the extraction of stable and discriminative features essential for accurate reconstruction. In this paper, we propose a fine-grained micro-expression reconstruction method that integrates a global dynamic feature capturing stable facial motion patterns with a locally-enriched feature incorporating multiple informative cues from 2D motions, facial priors and 3D facial geometry. Specifically, we devise a plug-and-play dynamic-encoded module to extract micro-expression feature for global facial action, allowing it to leverage prior knowledge from abundant macro-expression data to mitigate the scarcity of micro-expression data. Subsequently, a dynamic-guided mesh deformation module is designed for extracting aggregated local features from dense optical flow, sparse landmark cues and facial mesh geometry, which adaptively refines fine-grained facial micro-expression without compromising global 3D geometry. Extensive experiments on micro-expression datasets demonstrate that our method consistently outperforms state-of-the-art methods in both geometric accuracy and perceptual detail.

研究动机与目标

  • motivate accurate reconstruction of subtle micro-expressions which are often lost in macro-expression-focused methods.
  • Develop a coarse-to-fine framework that fuses global dynamic features with locally enriched cues from 2D motion, 3D geometry, and facial priors.
  • Leverage macro-expression data to mitigate micro-expression data scarcity through a dynamic-encoded module.
  • Refine initialized meshes with a dynamic-guided mesh deformation module that preserves global structure while capturing fine-grained details.

提出的方法

  • Introduce a plug-and-play dynamic-encoded module that uses a static encoder from onset frames and a motion encoder on optical flow to produce micro-expression enhanced parameters via residual fusion and an N-ODE based evolution.
  • Apply a dynamic-guided mesh deformation module that fuses multi-modal local features (3D geometry, facial landmarks, and dense optical-flow based motion) and refines meshes through a graph convolutional network with motion-attention.
  • Use region-based pixel-vertex correspondence to efficiently map optical-flow cues to 3D mesh regions, reducing computational load while preserving discriminability.
  • Combine reconstruction fidelity losses (photometric, perceptual, landmarks, expression regularization, emotion, expression consistency, identity) with geometry regularization losses (Laplacian smoothness, normal consistency, flow-guided refinement) for training.

实验结果

研究问题

  • RQ1 Can global dynamic facial features learned from macro-expressions improve the reconstruction of subtle micro-expressions in 3D?
  • RQ2 Do multi-modal local cues (3D geometry, landmarks, and 2D motion) provide complementary information that enables accurate micro-expression refinement on 3D meshes?
  • RQ3 Is a coarse-to-fine framework effective for preserving global facial structure while capturing fine-grained micro-expressions from monocular video?
  • RQ4 How does region-based motion mapping and motion-attentive refinement affect reconstruction fidelity and perceptual realism?

主要发现

MethodCASME II Acc (%)CASME Acc (%)SAMM Acc (%)Avg. Acc (%)L1 LossVGG LossFID
EMOCA40.0038.9331.3736.770.0851.578112.37
EMICA42.5028.8129.4133.570.0831.501100.04
SMIRK35.0044.0745.1041.390.0851.03252.26
SMIRK-FT46.2542.3750.9846.530.0500.74533.80
Ours53.7544.7056.8651.770.0410.70030.41
  • The proposed Ours method achieves higher micro-expression recognition accuracy across CASME II, CASME, and SAMM (53.75%, 44.70%, 56.86% respectively; Avg 51.77%) compared to EMOCA, EMICA, SMIRK, and SMIRK-FT.
  • Our method yields the best average WF1 score (45.52%), outperforming baselines especially on CASME II and SAMM.
  • Reconstruction quality metrics improve with our method, showing lower L1 loss (0.041) and lower VGG loss (0.700) on average than baselines, and better Fréchet Inception Distance (FID 30.41).
  • Ablation studies demonstrate the dynamic-encoded module as the most impactful component for accuracy, with significant drops when removing DEM or DGMD, and show the importance of multi-modal features and all loss terms.
  • Region-based motion mapping with motion-attentive refinement substantially contributes to discriminative micro-expression capture while maintaining global geometry

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。