QUICK REVIEW

[论文解读] Semi-Supervised Masked Autoencoders: Unlocking Vision Transformer Potential with Limited Data

Atik Faysal, Mohammad Rostami|arXiv (Cornell University)|Jan 27, 2026

Advanced Neural Network Applications被引用 0

一句话总结

SSMAE 将屏蔽自编码与动态、以验证为驱动的伪标签相结合，在有限标记数据上高效训练 Vision Transformer，并在 CIFAR-10/100 的低标记情境下，优于监督 ViT 与对 MAE 的微调。

ABSTRACT

We address the challenge of training Vision Transformers (ViTs) when labeled data is scarce but unlabeled data is abundant. We propose Semi-Supervised Masked Autoencoder (SSMAE), a framework that jointly optimizes masked image reconstruction and classification using both unlabeled and labeled samples with dynamically selected pseudo-labels. SSMAE introduces a validation-driven gating mechanism that activates pseudo-labeling only after the model achieves reliable, high-confidence predictions that are consistent across both weakly and strongly augmented views of the same image, reducing confirmation bias. On CIFAR-10 and CIFAR-100, SSMAE consistently outperforms supervised ViT and fine-tuned MAE, with the largest gains in low-label regimes (+9.24% over ViT on CIFAR-10 with 10% labels). Our results demonstrate that when pseudo-labels are introduced is as important as how they are generated for data-efficient transformer training. Codes are available at https://github.com/atik666/ssmae.

研究动机与目标

激发并解决在标记数据稀缺时训练 ViT 的问题，同时未标记数据充裕的情形下的挑战。
提出一个将掩码图像重构与监督学习整合的半监督框架。
引入基于验证的门控机制来控制伪标签生成并缓解确认偏差。
证明 SSMAE 能以数据高效的方式训练 ViT，在 CIFAR-10 和 CIFAR-100 上实现鲁棒性能。

提出的方法

使用 MAE 风格的掩码处理和 ViT 编码器-解码器，从所有数据中学习表征。
以双重目标进行训练：对所有数据进行掩码重建损失，对有标记数据进行监督分类损失。
引入基于置信度的伪标签方案，只有在高置信度且在弱增强/强增强之间保持一致性时才使用伪标签。
采用动态门控机制，在验证集上的模型可靠性达到预定阈值后才启动伪标签。
通过将重建损失与分类损失组合成总损失，并引入可控的伪标签权重来优化。
预训练阶段应用 75% 掩码，在热身期后启用伪标签，并持续监控验证置信度。

Figure 1 : Overview of the SSMAE framework. A shared encoder is trained on two tasks: masked image reconstruction for all data, and classification for labeled data. For unlabeled data, our dynamic gate generates high-confidence pseudo-labels, which are then included in supervised classification.

实验结果

研究问题

RQ1在标记数据稀缺且未标记数据充裕的情况下，SSMAE 是否能提升 ViT 的性能？
RQ2在半监督 ViT 训练中，伪标签应如何生成和门控以避免确认偏差？
RQ3将掩码重构与有限的监督相结合是否能产生鲁棒的、可迁移到下游分类任务的表征？
RQ4掩码比例和门控阈值对伪标签质量及整体准确性有何影响？

主要发现

SSMAE 在 CIFAR-10 和 CIFAR-100 的不同标注情境下，持续优于监督 ViT 与 MAE 微调。
在 CIFAR-100 仅有 10% 标注数据的情境下，SSMAE 达到 22.65% 的准确率，而 MAE 为 21.72%，监督 ViT 为 20.86%。
在 CIFAR-10 仅有 10% 标注数据的情境下，SSMAE 实现 56.80% 的准确率，比 ViT 高 9.24 个百分点，比 MAE 高 1.96 个百分点。
消融实验表明，重建与一致性正则化，以及动态门控各自对性能有显著贡献。
掩码比例分析表明 75% 掩码达到最高性能，达到 90% 时略微下降。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。