QUICK REVIEW

[论文解读] Spatial Transcriptomics as Images for Large-Scale Pretraining

Yishun Zhu, Jiaxin Qi|arXiv (Cornell University)|Mar 13, 2026

Single-cell and spatial transcriptomics被引用 0

一句话总结

论文将空间转录组数据视为可裁剪的多通道图像，以实现可扩展的、类图像的预训练，保持局部空间上下文并支持大规模 ST 表征。

ABSTRACT

Spatial Transcriptomics (ST) profiles thousands of gene expression values at discrete spots with precise coordinates on tissue sections, preserving spatial context essential for clinical and pathological studies. With rising sequencing throughput and advancing platforms, the expanding data volumes motivate large-scale ST pretraining. However, the fundamental unit for pretraining, i.e., what constitutes a single training sample, remains ill-posed. Existing choices fall into two camps: (1) treating each spot as an independent sample, which discards spatial dependencies and collapses ST into single-cell transcriptomics; and (2) treating an entire slide as a single sample, which produces prohibitively large inputs and drastically fewer training examples, undermining effective pretraining. To address this gap, we propose treating spatial transcriptomics as croppable images. Specifically, we define a multi-channel image representation with fixed spatial size by cropping patches from raw slides, thereby preserving spatial context while substantially increasing the number of training samples. Along the channel dimension, we define gene subset selection rules to control input dimensionality and improve pretraining stability. Extensive experiments show that the proposed image-like dataset construction for ST pretraining consistently improves downstream performance, outperforming conventional pretraining schemes. Ablation studies verify that both spatial patching and channel design are necessary, establishing a unified, practical paradigm for organizing ST data and enabling large-scale pretraining.

研究动机与目标

明确现有基于点位的和基于切片的 ST 预训练方案在保持空间上下文与样本量之间的权衡中的局限性。
引入固定尺寸裁剪的补丁基多通道图像表示，用于扩展预训练规模。
开发一个对输入维数有控制作用的基于方差加权通道选择的“重要性感知”基因子集选择方法。
证明基于补丁的 ST 预训练在跨数据集的下游空间域检测和重构任务中具有改进作用。

提出的方法

将每个 ST 切片表示为可裁剪的二维格点网格，每个点的基因表达构成多通道向量。
从 ST 网格中裁剪固定大小的 h x w 补丁，作为类图像的训练单元。
将空间坐标标准化为紧凑晶格并随机采样补丁以增加训练样本。
通过方差加权的通道选择选择每个补丁的固定数量 m 个基因，以控制通道维数。
在这些补丁上用 ViT 主干训练带掩码的自编码器，使用掩码重建被掩码的基因通道的目标。
使用下游任务（空间域检测、k-NN、MLP 分类器）和一个掩码区域重构任务进行评估。

实验结果

研究问题

RQ1如何表示 ST 数据以在保持空间上下文与可扩展样本生成之间取得平衡，以用于预训练？
RQ2与基于点位或切片的方案相比，基于补丁的多通道类图像预训练是否提升下游 ST 任务？
RQ3通道（基因）选择如何影响预训练稳定性和下游性能？
RQ4补丁大小和通道数对下游空间域检测和重构任务有何影响？

主要发现

基于补丁的 ST 预训练在多数据集的空间域检测任务上始终优于基于点位和切片的方案。
与基于点位的预训练（scGPT）相比，所提方法在准确率和 ARI 上有显著的平均提升（例如在报道任务中的平均 Acc 提升约 0.287，ARI 提升约 0.347）。
与空间增强基线 scGPT-spatial 相比，该方法在平均上提供额外的 Acc 和 ARI 提升（例如 Acc 约提升 0.059，ARI 提升 0.086）。
在掩码区域重构方面，提出的方法在不同掩码尺寸下的 MSE/MAE 均优于 scGPT-spatial，表明在空间与转录表示学习方面表现更好。
消融与类似消融的分析表明，空间补丁化与通道设计对于达到最佳性能都是必要的。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。