Semi-Supervised Masked Autoencoders: Unlocking Vision Transformer Potential with Limited Data

Semi-Supervised Masked Autoencoders: Unlocking Vision Transformer Potential with Limited Data
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We address the challenge of training Vision Transformers (ViTs) when labeled data is scarce but unlabeled data is abundant. We propose Semi-Supervised Masked Autoencoder (SSMAE), a framework that jointly optimizes masked image reconstruction and classification using both unlabeled and labeled samples with dynamically selected pseudo-labels. SSMAE introduces a validation-driven gating mechanism that activates pseudo-labeling only after the model achieves reliable, high-confidence predictions that are consistent across both weakly and strongly augmented views of the same image, reducing confirmation bias. On CIFAR-10 and CIFAR-100, SSMAE consistently outperforms supervised ViT and fine-tuned MAE, with the largest gains in low-label regimes (+9.24% over ViT on CIFAR-10 with 10% labels). Our results demonstrate that when pseudo-labels are introduced is as important as how they are generated for data-efficient transformer training. Codes are available at https://github.com/atik666/ssmae.


💡 Research Summary

The paper introduces Semi‑Supervised Masked Autoencoders (SSMAE), a framework that unifies masked image reconstruction and classification to train Vision Transformers (ViTs) efficiently when labeled data is scarce but unlabeled data is abundant. SSMAE builds on the Masked Autoencoder (MAE) paradigm: an input image is split into non‑overlapping patches, a high masking ratio (75 %) is applied, and only the visible tokens are fed into a ViT encoder. The encoder output serves two purposes. First, a lightweight decoder reconstructs the masked patches, and a mean‑squared error loss (L_recon) is computed only on the masked region. Second, the


Comments & Academic Discussion

Loading comments...

Leave a Comment