[论文解读] Masked Contrastive Graph Representation Learning for Age Estimation
MCGRL 引入了一种掩蔽图卷积框架,将结构化图信息与语义 CNN 特征融合用于年龄估计,并通过对比学习提升辨别力和泛化性。在 MORPH、FG-NET 和 CACD 数据集上优于最先进的方法。
Age estimation of face images is a crucial task with various practical applications in areas such as video surveillance and Internet access control. While deep learning-based age estimation frameworks, e.g., convolutional neural network (CNN), multi-layer perceptrons (MLP), and transformers have shown remarkable performance, they have limitations when modelling complex or irregular objects in an image that contains a large amount of redundant information. To address this issue, this paper utilizes the robustness property of graph representation learning in dealing with image redundancy information and proposes a novel Masked Contrastive Graph Representation Learning (MCGRL) method for age estimation. Specifically, our approach first leverages CNN to extract semantic features of the image, which are then partitioned into patches that serve as nodes in the graph. Then, we use a masked graph convolutional network (GCN) to derive image-based node representations that capture rich structural information. Finally, we incorporate multiple losses to explore the complementary relationship between structural information and semantic features, which improves the feature representation capability of GCN. Experimental results on real-world face image datasets demonstrate the superiority of our proposed method over other state-of-the-art age estimation approaches.
研究动机与目标
- Motivate robust age estimation from face images with substantial redundant information.
- Model irregular image regions via a graph-based representation to capture structural relationships.
- Fuse semantic (CNN-based) features with graph-structured representations through contrastive learning.
- Reduce intra-class variation and enlarge inter-class differences to improve generalization.
提出的方法
- Segment face images into patches as graph nodes and construct a K-NN graph.
- Use a masked graph convolutional network (GCN) to obtain structural embeddings with mask-based augmentation.
- Generate anchor embeddings from CNN+MLP features for efficiency, and create positive/negative samples via masking and row shuffling.
- Apply multiple loss functions (triplet-based with L_N, L_M, and upper-bound L_V) to align positives and separate negatives, enforcing controlled distances.
- Train with a three-term loss to fuse structural and semantic information and improve discrimination.
实验结果
研究问题
- RQ1Can a masked graph-based representation capture robust structural information for age estimation beyond regular CNN/ViT approaches?
- RQ2Does contrastive learning with masked GCN positives and shuffled negatives improve age estimation accuracy and generalization across datasets?
- RQ3What is the impact of graph convolution variants and masking rate on age-estimation performance?
- RQ4How well does MCGRL generalize across cross-dataset evaluations compared with state-of-the-art methods?
主要发现
| Dataset | MAE (MCGRL) | CS (%) (MCGRL) | Dataset (comparison) | MAE (best competitor) | CS (%) (best competitor) | Dataset (best overall) | MAE (best overall) | CS (%) (best overall) |
|---|---|---|---|---|---|---|---|---|
| MORPH | 2.39 | 89.9 | MORPH (comparison) | 2.42–4.03 | 70.1–87.4 | MORPH (best overall) | 2.39 | 89.9 |
| FG-NET | 2.86 | 88.0 | FG-NET (comparison) | 3.74–5.79 | 66.5–74.5 | FG-NET (best overall) | 2.86 | 88.0 |
| CACD | 4.03 | 80.1 | CACD (comparison) | 4.03–6.52 | 60.0–72.8 | CACD (best overall) | 4.03 | 80.1 |
- MCGRL achieves superior mean absolute error (MAE) and cumulative score (CS) across MORPH, FG-NET, and CACD datasets, e.g., MAE 2.39 on MORPH and CS 89.9%.
- Cross-dataset evaluations show MCGRL outperforms competing methods on FG-NET, MORPH, FACES, SC-FACE variants, with notable gains in CS.
- Ablation studies confirm that combining L_N, L_M, and L_V losses yields best performance across datasets.
- Max-Relative GraphConv with the proposed losses delivers the best MAE across MORPH, FG-NET, and CACD.
- Mask rate analysis indicates optimal masking at p = 0.6 for best MAE on the evaluated datasets.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。