QUICK REVIEW

[论文解读] Real-time Convolutional Neural Networks for Emotion and Gender Classification

Octavio Arriaga, Matías Valdenegro-Toro|arXiv (Cornell University)|Oct 20, 2017

Face recognition and analysis被引用 154

一句话总结

本文提出用于实时同时进行人脸检测、性别分类和情感分类的CNN架构，在参数显著减少的同时实现高精度，并在机器人平台上实现实时性能。

ABSTRACT

In this paper we propose an implement a general convolutional neural network (CNN) building framework for designing real-time CNNs. We validate our models by creating a real-time vision system which accomplishes the tasks of face detection, gender classification and emotion classification simultaneously in one blended step using our proposed CNN architecture. After presenting the details of the training procedure setup we proceed to evaluate on standard benchmark sets. We report accuracies of 96% in the IMDB gender dataset and 66% in the FER-2013 emotion dataset. Along with this we also introduced the very recent real-time enabled guided back-propagation visualization technique. Guided back-propagation uncovers the dynamics of the weight changes and evaluates the learned features. We argue that the careful implementation of modern CNN architectures, the use of the current regularization methods and the visualization of previously hidden features are necessary in order to reduce the gap between slow performances and real-time architectures. Our system has been validated by its deployment on a Care-O-bot 3 robot used during RoboCup@Home competitions. All our code, demos and pre-trained architectures have been released under an open-source license in our public repository.

研究动机与目标

开发适用于机器人和嵌入式系统的通用实时CNN框架。
创建在单个管道中完成脸部检测、性别分类和情感分类的架构。
在保持准确性的同时减小模型大小和计算量。
提供实时可视化以解释学习到的特征和模型行为。
在移动机器人平台上展示部署并发布开源资源。

提出的方法

提出两种CNN设计：一种是通过全局平均池化去除全连接层的顺序全卷积网络，另一种是带有深度可分卷积与残差模块的mini-Xception。
训练时使用Adam优化器。
消除全连接层以减少参数，并应用深度可分卷积以进一步缩小模型大小。
在最终层应用全局平均池化和softmax分类器实现多类别输出。
将人脸检测、性别和情感分类整合到一个实时管道中。
引入引导反向传播可视化以解释学习到的特征。

实验结果

研究问题

RQ1实时CNN架构是否能在参数显著更少的情况下实现具有竞争力的性别和情感分类准确度？
RQ2在受限硬件上，在单个实时管道中运行脸部检测、性别分类和情感分类是否可行？
RQ3深度可分卷积和残差连接是否在降低参数的同时保持这些任务的准确性？
RQ4使用引导反向传播对情感和性别任务的学习特征进行可视化时，特征有多可解释？

主要发现

一个全卷积模型在IMDB性别数据集上达到96%准确率，参数约为600,000。
该顺序全卷积网络在FER-2013情感数据集上达到66%准确率。
mini-Xception架构在约60,000参数下实现95%的性别准确率和66%的情感准确率。
完整管道（人脸检测、性别和情感）在i5-4210M CPU上以0.22 ms运行，相较原始架构实现加速。
模型权重可以存储在大约855 KB。
引导反向传播可视化显示可解释的特征，如皱眉线、牙齿可见性和眉形，并揭示与眼镜或西方面部特征相关的偏差。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。