[论文解读] Suppressing Uncertainties for Large-Scale Facial Expression Recognition
Introduces Self-Cure Network (SCN) to suppress uncertainties in large-scale FER using self-attention weighting, rank regularization, and relabeling; achieves state-of-the-art on RAF-DB, AffectNet, and FERPlus.
Annotating a qualitative large-scale facial expression dataset is extremely difficult due to the uncertainties caused by ambiguous facial expressions, low-quality facial images, and the subjectiveness of annotators. These uncertainties lead to a key challenge of large-scale Facial Expression Recognition (FER) in deep learning era. To address this problem, this paper proposes a simple yet efficient Self-Cure Network (SCN) which suppresses the uncertainties efficiently and prevents deep networks from over-fitting uncertain facial images. Specifically, SCN suppresses the uncertainty from two different aspects: 1) a self-attention mechanism over mini-batch to weight each training sample with a ranking regularization, and 2) a careful relabeling mechanism to modify the labels of these samples in the lowest-ranked group. Experiments on synthetic FER datasets and our collected WebEmotion dataset validate the effectiveness of our method. Results on public benchmarks demonstrate that our SCN outperforms current state-of-the-art methods with extbf{88.14}\% on RAF-DB, extbf{60.23}\% on AffectNet, and extbf{89.35}\% on FERPlus. The code will be available at \href{https://github.com/kaiwang960112/Self-Cure-Network}{https://github.com/kaiwang960112/Self-Cure-Network}.
研究动机与目标
- Motivate and address uncertainties in large-scale FER arising from ambiguous expressions, low-quality images, and annotator subjectivity.
- Propose a simple yet effective framework (SCN) to suppress uncertainties during training of deep FER models.
- Design three modules—self-attention weighting, rank regularization, and relabeling—to reduce the impact of uncertain samples.
- Demonstrate SCN’s effectiveness on synthetic noisy data, a real-world uncertain WebEmotion dataset, and public FER benchmarks.
- Provide ablation studies to quantify contributions of each module and loss components.
提出的方法
- Extract facial features with a backbone CNN and apply a self-attention importance weighting module to assign an importance weight to each sample.
- Compute a logit-weighted cross-entropy loss (WCE-Loss) using the sample weights to emphasize reliable samples.
- Regularize the learned weights with a Rank Regularization loss (RR-Loss) by ranking weights, splitting into high/low groups, and enforcing a margin between their means.
- Optionally relabel uncertain samples in the low-importance group by comparing the maximum predicted probability to the given label probability with a margin threshold (delta2).
- Train end-to-end with a combination of RR-Loss and WCE-Loss; use a two-stage strategy including relabeling after epoch 10.
- Implementation specifics: ResNet-18 backbone, MTCNN for face detection, batch size 1024, beta=0.7 as the high-importance group ratio, delta1=0.15, delta2=0.2.]
- research_questions:[
实验结果
研究问题
- RQ1How can training with uncertain annotations be mitigated in large-scale FER?
- RQ2Can a lightweight, end-to-end module (SCN) improve robustness to noisy labels and low-quality data without extra inference cost?
- RQ3What is the relative contribution of self-attention weighting, rank regularization, and relabeling to FER performance under uncertainty?
- RQ4Does pretraining on noisy real-world data (WebEmotion) plus SCN pretraining improve performance on clean FER benchmarks?
- RQ5How do SCN components perform under synthetic label noise and real-world uncertain annotations?
主要发现
- SCN consistently improves baselines under synthetic label noise on RAF-DB, FERPlus, and AffectNet, with larger gains at higher noise levels.
- Self-attention weighting (WCE-Loss) provides the strongest performance boost among SCN components.
- Rank Regularization (RR-Loss) and Relabeling provide additional gains on top of WCE-Loss in ablations.
- Pretraining on WebEmotion with SCN further improves RAF-DB, AffectNet, and FERPlus after fine-tuning on target datasets.
- SCN achieves state-of-the-art results: 88.14% on RAF-DB, 60.23% on AffectNet, and 89.35% on FERPlus (with IR50).
- SCN-enabled pretraining on WebEmotion yields higher downstream performance than pretraining without SCN.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。