QUICK REVIEW

[논문 리뷰] Convolutional Neural Networks for Facial Expression Recognition

Shima Alizadeh, Azar Fazel|arXiv (Cornell University)|2017. 04. 22.

Face and Expression Recognition참고 문헌 10인용 수 79

한 줄 요약

이 논문은 48x48 그레이스케일 얼굴에서 7가지 감정을 분류하기 위해 얕은(shallow) 및 심층(deep) CNN을 학습시키고, 원시 픽셀 CNN과 하이브리드 CNN+HOG 접근법을 비교하며, 성능, 시각화, 및 과적합 완화에 대해 분석한다.

ABSTRACT

We have developed convolutional neural networks (CNN) for a facial expression recognition task. The goal is to classify each facial image into one of the seven facial emotion categories considered in this study. We trained CNN models with different depth using gray-scale images. We developed our models in Torch and exploited Graphics Processing Unit (GPU) computation in order to expedite the training process. In addition to the networks performing based on raw pixel data, we employed a hybrid feature strategy by which we trained a novel CNN model with the combination of raw pixel data and Histogram of Oriented Gradients (HOG) features. To reduce the overfitting of the models, we utilized different techniques including dropout and batch normalization in addition to L2 regularization. We applied cross validation to determine the optimal hyper-parameters and evaluated the performance of the developed models by looking at their training histories. We also present the visualization of different layers of a network to show what features of a face can be learned by CNN models.

연구 동기 및 목표

Develop CNN architectures of varying depth for automatic facial expression recognition into seven emotion categories.
Evaluate the impact of depth, regularization, and data augmentation on accuracy and generalization.
Investigate whether combining HOG features with CNN outputs improves performance.
Visualize learned features and activation maps to interpret what CNNs capture about facial expressions.

제안 방법

Implement shallow and deep CNN architectures with configurable Conv/FC layers, batch normalization, dropout, and max-pooling in Torch.
Train models on ~29k training images (48x48 grayscale) with 3-way data splits; apply normalization and horizontal flipping for augmentation.
Experiment with a hybrid feature approach by concatenating HOG features with CNN features before the FC layers.
Evaluate via validation and test accuracy; analyze confusion matrices and per-expression accuracy.
Visualize activation maps and first-layer weights; apply DeepDream to best model for pattern discovery.

실험 결과

연구 질문

RQ1What is the impact of increasing CNN depth on facial expression recognition accuracy from the Kaggle 7-emotion dataset?
RQ2Do deeper CNNs improve per-expression classification accuracy compared with shallow networks?
RQ3Does integrating HOG features with CNN features enhance recognition performance beyond using raw pixels alone?
RQ4What can activation maps, weight visualizations, and DeepDream reveal about learned facial features?

주요 결과

Deep CNNs significantly improve accuracy over shallow networks (validation: 65% vs 55%; test: 64% vs 54%).
Deeper networks reduce overfitting and benefit from regularization techniques (dropout, batch norm, L2).
Per-expression accuracy generally improves with depth, notably Happy (shallow 75% vs deep 80.5%), Neutral and Angry show gains; some expressions like Surprise/Fear may not improve with depth.
Hybrid CNN+HOG features do not outperform raw-pixel CNNs (accuracy very close to vanilla CNNs for both shallow and deep models).
Confusion matrices indicate common misclassifications (e.g., Angry with Fear/Sad) in both models; DeepCNN yields more correct predictions across most labels.
Activation maps become sparser and more localized over training; first-layer filters appear smooth; DeepDream applied to best model reveals expression-specific patterns.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.