[Paper Review] Self-Supervision Closes the Gap Between Weak and Strong Supervision in Histology
The paper trains an in-domain, self-supervised feature extractor (MoCo v2) on histology tiles to replace ImageNet features, greatly boosting weakly-supervised histology performance and closing the gap to strong supervision on Camelyon16.
One of the biggest challenges for applying machine learning to histopathology is weak supervision: whole-slide images have billions of pixels yet often only one global label. The state of the art therefore relies on strongly-supervised model training using additional local annotations from domain experts. However, in the absence of detailed annotations, most weakly-supervised approaches depend on a frozen feature extractor pre-trained on ImageNet. We identify this as a key weakness and propose to train an in-domain feature extractor on histology images using MoCo v2, a recent self-supervised learning algorithm. Experimental results on Camelyon16 and TCGA show that the proposed extractor greatly outperforms its ImageNet counterpart. In particular, our results improve the weakly-supervised state of the art on Camelyon16 from 91.4% to 98.7% AUC, thereby closing the gap with strongly-supervised models that reach 99.3% AUC. Through these experiments, we demonstrate that feature extractors trained via self-supervised learning can act as drop-in replacements to significantly improve existing machine learning techniques in histology. Lastly, we show that the learned embedding space exhibits biologically meaningful separation of tissue structures.
Motivation & Objective
- Motivate weak supervision in histology due to slide-level labeling only; identify ImageNet pre-training as a key weakness; propose in-domain self-supervised pre-training with MoCo v2; demonstrate performance gains on Camelyon16 and TCGA-COAD; show biological meaningfulness and transferability of learned embeddings.
Proposed method
- Tile whole-slide images into fixed-size patches at a chosen zoom level; extract tile features with a frozen encoder; apply multiple instance learning (MIL) to aggregate tile information to slide labels; pre-train the tile encoder in-domain using MoCo v2 with contrastive loss on unlabeled histology tiles; augment MoCo v2 with rotations and flips appropriate to histology data; evaluate across three MIL architectures (Weldon, Chowder, DeepMIL) on two datasets; compare against ImageNet pre-training and report AUC improvements.
Experimental results
Research questions
- RQ1Can in-domain self-supervised pre-training with MoCo v2 improve weakly-supervised histology models over ImageNet-pretrained features?
- RQ2Do improvements generalize across different MIL architectures and datasets (Camelyon16 and TCGA-COAD)?
- RQ3How close can weakly-supervised performance get to strongly-supervised baselines with self-supervised in-domain features?
- RQ4Do learned embeddings exhibit biologically meaningful clustering and support transfer learning across organs/tumor types?
Key findings
- MoCo v2 in-domain features substantially boost weakly-supervised histology results across MIL models.
- On Camelyon16, weakly-supervised performance reaches 98.7% AUC, approaching the 99.3% AUC of strongly-supervised models.
- Standard deviation drops dramatically with MoCo v2 features, indicating much more robust performance (vs. ImageNet features).
- On TCGA-COAD CMS classification, MoCo v2 features yield large AUC gains over ImageNet and comparable results to state-of-the-art methods that use annotations and ensembling.
- Transferred MoCo v2 features from TCGA-COAD to Camelyon16 (and vice versa) show strong cross-dataset performance, highlighting transferability of the learned representation.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.