QUICK REVIEW

[论文解读] Suppressing Uncertainties for Large-Scale Facial Expression Recognition

Kai Wang, Xiaojiang Peng|arXiv (Cornell University)|Feb 24, 2020

Emotion and Mood Recognition参考文献 41被引用 53

一句话总结

Introduces Self-Cure Network (SCN) to suppress uncertainties in large-scale FER using self-attention weighting, rank regularization, and relabeling; achieves state-of-the-art on RAF-DB, AffectNet, and FERPlus.

ABSTRACT

Annotating a qualitative large-scale facial expression dataset is extremely difficult due to the uncertainties caused by ambiguous facial expressions, low-quality facial images, and the subjectiveness of annotators. These uncertainties lead to a key challenge of large-scale Facial Expression Recognition (FER) in deep learning era. To address this problem, this paper proposes a simple yet efficient Self-Cure Network (SCN) which suppresses the uncertainties efficiently and prevents deep networks from over-fitting uncertain facial images. Specifically, SCN suppresses the uncertainty from two different aspects: 1) a self-attention mechanism over mini-batch to weight each training sample with a ranking regularization, and 2) a careful relabeling mechanism to modify the labels of these samples in the lowest-ranked group. Experiments on synthetic FER datasets and our collected WebEmotion dataset validate the effectiveness of our method. Results on public benchmarks demonstrate that our SCN outperforms current state-of-the-art methods with extbf{88.14}\% on RAF-DB, extbf{60.23}\% on AffectNet, and extbf{89.35}\% on FERPlus. The code will be available at \href{https://github.com/kaiwang960112/Self-Cure-Network}{https://github.com/kaiwang960112/Self-Cure-Network}.

研究动机与目标

Motivate and address uncertainties in large-scale FER arising from ambiguous expressions, low-quality images, and annotator subjectivity.
Propose a simple yet effective framework (SCN) to suppress uncertainties during training of deep FER models.
Design three modules—self-attention weighting, rank regularization, and relabeling—to reduce the impact of uncertain samples.
Demonstrate SCN’s effectiveness on synthetic noisy data, a real-world uncertain WebEmotion dataset, and public FER benchmarks.
Provide ablation studies to quantify contributions of each module and loss components.

提出的方法

Extract facial features with a backbone CNN and apply a self-attention importance weighting module to assign an importance weight to each sample.
Compute a logit-weighted cross-entropy loss (WCE-Loss) using the sample weights to emphasize reliable samples.
Regularize the learned weights with a Rank Regularization loss (RR-Loss) by ranking weights, splitting into high/low groups, and enforcing a margin between their means.
Optionally relabel uncertain samples in the low-importance group by comparing the maximum predicted probability to the given label probability with a margin threshold (delta2).
Train end-to-end with a combination of RR-Loss and WCE-Loss; use a two-stage strategy including relabeling after epoch 10.
Implementation specifics: ResNet-18 backbone, MTCNN for face detection, batch size 1024, beta=0.7 as the high-importance group ratio, delta1=0.15, delta2=0.2.]
research_questions:[

实验结果

研究问题

RQ1How can training with uncertain annotations be mitigated in large-scale FER?
RQ2Can a lightweight, end-to-end module (SCN) improve robustness to noisy labels and low-quality data without extra inference cost?
RQ3What is the relative contribution of self-attention weighting, rank regularization, and relabeling to FER performance under uncertainty?
RQ4Does pretraining on noisy real-world data (WebEmotion) plus SCN pretraining improve performance on clean FER benchmarks?
RQ5How do SCN components perform under synthetic label noise and real-world uncertain annotations?

主要发现

SCN consistently improves baselines under synthetic label noise on RAF-DB, FERPlus, and AffectNet, with larger gains at higher noise levels.
Self-attention weighting (WCE-Loss) provides the strongest performance boost among SCN components.
Rank Regularization (RR-Loss) and Relabeling provide additional gains on top of WCE-Loss in ablations.
Pretraining on WebEmotion with SCN further improves RAF-DB, AffectNet, and FERPlus after fine-tuning on target datasets.
SCN achieves state-of-the-art results: 88.14% on RAF-DB, 60.23% on AffectNet, and 89.35% on FERPlus (with IR50).
SCN-enabled pretraining on WebEmotion yields higher downstream performance than pretraining without SCN.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。