QUICK REVIEW

[论文解读] Recognizing Semantic Features in Faces using Deep Learning

Amogh Gudi|arXiv (Cornell University)|Dec 2, 2015

Face recognition and analysis参考文献 16被引用 35

一句话总结

本文提出一种基于卷积神经网络（CNNs）的深度学习框架，可直接从2D人脸图像中自动识别语义面部特征（如情绪、年龄、性别和种族），无需人工特征工程。该方法通过端到端训练实现接近人类的性能，展示在最小精度损失下联合分类多种特征的能力，并提出一种新颖的方法，利用深度网络从2D图像生成3D主动外观模型（AAMs）。

ABSTRACT

The human face constantly conveys information, both consciously and subconsciously. However, as basic as it is for humans to visually interpret this information, it is quite a big challenge for machines. Conventional semantic facial feature recognition and analysis techniques are already in use and are based on physiological heuristics, but they suffer from lack of robustness and high computation time. This thesis aims to explore ways for machines to learn to interpret semantic information available in faces in an automated manner without requiring manual design of feature detectors, using the approach of Deep Learning. This thesis provides a study of the effects of various factors and hyper-parameters of deep neural networks in the process of determining an optimal network configuration for the task of semantic facial feature recognition. This thesis explores the effectiveness of the system to recognize the various semantic features (like emotions, age, gender, ethnicity etc.) present in faces. Furthermore, the relation between the effect of high-level concepts on low level features is explored through an analysis of the similarities in low-level descriptors of different semantic features. This thesis also demonstrates a novel idea of using a deep network to generate 3-D Active Appearance Models of faces from real-world 2-D images. For a more detailed report on this work, please see [arXiv:1512.00743v1].

研究动机与目标

开发一种基于深度学习的系统，无需人工特征设计即可自动识别语义面部特征，如情绪、年龄、性别和种族。
研究网络超参数、输入预处理和尺度对语义面部特征分类精度的影响。
探索深度网络所学习的高层语义概念（如情绪）与低层次视觉描述符（如边缘、纹理）之间的关系。
评估利用深度网络从2D人脸图像生成3D主动外观模型（AAMs）的可行性。

提出的方法

在预处理并对齐的2D人脸图像上训练深度卷积神经网络（CNNs），以分类情绪、年龄、性别和种族等语义特征。
应用确定性预处理和图像对齐以提升网络性能与泛化能力。
通过第一层卷积滤波器的余弦相似性分析，比较不同语义任务之间的低层次特征表示。
设计联合分类网络，通过统一的37类标签集同时预测多个非互斥的面部属性。
提出一种新颖的基于深度学习的方法，通过学习压缩且结构化的表示，从2D图像生成3D主动外观模型（AAMs）。
使用标准指标评估网络性能，并将联合分类精度与单任务网络进行对比。

实验结果

研究问题

RQ1如何将深度学习适配于以端到端方式识别语义面部特征（如情绪、年龄、性别和种族）？
RQ2超参数、输入预处理和网络架构如何影响深度网络在语义面部特征识别中的性能？
RQ3在深度神经网络中，高层语义概念（如情绪）与低层次视觉描述符（如边缘、纹理）之间存在何种关系？
RQ4是否可以使用单一深度网络联合分类多个非互斥的面部属性，且性能下降最小？
RQ5是否可行训练深度网络从2D人脸图像生成3D主动外观模型？

主要发现

深度学习模型在识别情绪、年龄、性别和种族等语义面部特征方面达到了接近人类的性能。
预处理和图像对齐显著提升了分类精度，证明了输入质量和一致性的关键作用。
相似任务（如年龄、性别、面部毛发）的第一层权重显示出较高的余弦相似性，表明存在共享的视觉模式。
联合分类网络的平均精度比单任务网络低1.84%（范围：0.91%–4.71%），证明了多任务学习的有效性，且性能损失最小。
网络成功从2D图像生成了3D主动外观模型，真实人脸在X/Y轴上的平均误差分别为2.05°/1.56°，合成人脸为2.23°/1.66°，表明形状和姿态重建具有高度保真度。
本研究首次证明，深度网络能够直接从2D图像学习预测压缩且结构化的3D表示（AAMs）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。