[Paper Review] Towards Large-Scale Training of Pathology Foundation Models
The paper presents a scalable pipeline for training large pathology foundation models using online patching, evaluates multiple hyperparameters and magnifications on TCGA data, and releases an evaluation framework (eva) for standardized downstream tasks.
Driven by the recent advances in deep learning methods and, in particular, by the development of modern self-supervised learning algorithms, increased interest and efforts have been devoted to build foundation models (FMs) for medical images. In this work, we present our scalable training pipeline for large pathology imaging data, and a comprehensive analysis of various hyperparameter choices and training techniques for building pathology FMs. We release and make publicly available the first batch of our pathology FMs (https://github.com/kaiko-ai/towards_large_pathology_fms) trained on open-access TCGA whole slide images, a commonly used collection of pathology images. The experimental evaluation shows that our models reach state-of-the-art performance on various patch-level downstream tasks, ranging from breast cancer subtyping to colorectal nuclear segmentation. Finally, to unify the evaluation approaches used in the field and to simplify future comparisons of different FMs, we present an open-source framework (https://github.com/kaiko-ai/eva) designed for the consistent evaluation of pathology FMs across various downstream tasks.
Motivation & Objective
- Demonstrate a scalable training pipeline for pathology foundation models on large-scale WSIs.
- Analyze the impact of hyperparameters such as initialization, magnification mix, and data size on downstream performance.
- Show that online patching enables high-throughput patch loading without offline pre-creation.
- Provide an open framework (eva) for consistent evaluation across downstream tasks.
Proposed method
- Develop Online Patching for high-throughput, patch-level loading from WSIs stored in blob storage.
- Pre-train ViT-based foundation models with DINO and DINOv2 using patches from TCGA across multiple magnifications.
- Initialize from ImageNet SSL weights and study convergence benefits.
- Evaluate models on multiple patch-level downstream tasks (BACH, CRC, MHIST, PCam, TP53, CoNSeP) using linear probing.
- Compare model sizes and magnification strategies to assess robustness and generalization.
Experimental results
Research questions
- RQ1Does online patching enable scalable, diverse patch sampling without compromising performance?
- RQ2How do initialization and pre-training on ImageNet influence convergence and downstream accuracy for pathology FMs?
- RQ3What is the effect of training with multiple magnifications on robustness and task performance?
- RQ4How does training data size (slides and patches) affect in-distribution and out-of-distribution performance?
Key findings
- Online patching yields competitive or superior patch-level performance versus state-of-the-art, while enabling scalable data handling.
- Initializing from ImageNet pre-trained weights accelerates convergence and improves downstream performance.
- Training with multiple magnifications improves robustness and outperforms single-magnification models.
- Increasing the number of training slides generally improves performance, with diminishing returns, and diverse data are needed for better generalization to OOD data.
- Increasing the number of distinct training patches improves ID performance, with limited gains for OOD tasks unless patch diversity is substantially increased.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.