QUICK REVIEW

[论文解读] Convolutional Neural Networks for Facial Expression Recognition

Shima Alizadeh, Azar Fazel|arXiv (Cornell University)|Apr 22, 2017

Face and Expression Recognition参考文献 10被引用 79

一句话总结

该论文在48x48灰度人脸上训练CNN（浅层和深层）以将七种情绪分类，比较原始像素CNN与混合CNN+HOG方法，并分析性能、可视化和缓解过拟合的方法。

ABSTRACT

We have developed convolutional neural networks (CNN) for a facial expression recognition task. The goal is to classify each facial image into one of the seven facial emotion categories considered in this study. We trained CNN models with different depth using gray-scale images. We developed our models in Torch and exploited Graphics Processing Unit (GPU) computation in order to expedite the training process. In addition to the networks performing based on raw pixel data, we employed a hybrid feature strategy by which we trained a novel CNN model with the combination of raw pixel data and Histogram of Oriented Gradients (HOG) features. To reduce the overfitting of the models, we utilized different techniques including dropout and batch normalization in addition to L2 regularization. We applied cross validation to determine the optimal hyper-parameters and evaluated the performance of the developed models by looking at their training histories. We also present the visualization of different layers of a network to show what features of a face can be learned by CNN models.

研究动机与目标

Develop CNN architectures of varying depth for automatic facial expression recognition into seven emotion categories.
Evaluate the impact of depth, regularization, and data augmentation on accuracy and generalization.
Investigate whether combining HOG features with CNN outputs improves performance.
Visualize learned features and activation maps to interpret what CNNs capture about facial expressions.

提出的方法

Implement shallow and deep CNN architectures with configurable Conv/FC layers, batch normalization, dropout, and max-pooling in Torch.
Train models on ~29k training images (48x48 grayscale) with 3-way data splits; apply normalization and horizontal flipping for augmentation.
Experiment with a hybrid feature approach by concatenating HOG features with CNN features before the FC layers.
Evaluate via validation and test accuracy; analyze confusion matrices and per-expression accuracy.
Visualize activation maps and first-layer weights; apply DeepDream to best model for pattern discovery.

实验结果

研究问题

RQ1What is the impact of increasing CNN depth on facial expression recognition accuracy from the Kaggle 7-emotion dataset?
RQ2Do deeper CNNs improve per-expression classification accuracy compared with shallow networks?
RQ3Does integrating HOG features with CNN features enhance recognition performance beyond using raw pixels alone?
RQ4What can activation maps, weight visualizations, and DeepDream reveal about learned facial features?

主要发现

Expression	Shallow Model	Deep Model
Angry	41%	53%
Disgust	32%	70%
Fear	54%	46%
Happy	75%	80.5%
Sad	32%	63%
Surprise	67.5%	62.5%
Neutral	39.9%	51.5%

Deep CNNs significantly improve accuracy over shallow networks (validation: 65% vs 55%; test: 64% vs 54%).
Deeper networks reduce overfitting and benefit from regularization techniques (dropout, batch norm, L2).
Per-expression accuracy generally improves with depth, notably Happy (shallow 75% vs deep 80.5%), Neutral and Angry show gains; some expressions like Surprise/Fear may not improve with depth.
Hybrid CNN+HOG features do not outperform raw-pixel CNNs (accuracy very close to vanilla CNNs for both shallow and deep models).
Confusion matrices indicate common misclassifications (e.g., Angry with Fear/Sad) in both models; DeepCNN yields more correct predictions across most labels.
Activation maps become sparser and more localized over training; first-layer filters appear smooth; DeepDream applied to best model reveals expression-specific patterns.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。