QUICK REVIEW

[論文レビュー] Spatial Transcriptomics as Images for Large-Scale Pretraining

Yishun Zhu, Jiaxin Qi|arXiv (Cornell University)|Mar 13, 2026

Single-cell and spatial transcriptomics被引用数 0

ひとこと要約

論文は空間転写組織データを切り取れるマルチチャンネル画像として扱い、局所的な空間コンテキストを保持しつつ大規模なST表現をサポートする、スケーラブルな画像のような事前学習を可能にすることを提案します。

ABSTRACT

Spatial Transcriptomics (ST) profiles thousands of gene expression values at discrete spots with precise coordinates on tissue sections, preserving spatial context essential for clinical and pathological studies. With rising sequencing throughput and advancing platforms, the expanding data volumes motivate large-scale ST pretraining. However, the fundamental unit for pretraining, i.e., what constitutes a single training sample, remains ill-posed. Existing choices fall into two camps: (1) treating each spot as an independent sample, which discards spatial dependencies and collapses ST into single-cell transcriptomics; and (2) treating an entire slide as a single sample, which produces prohibitively large inputs and drastically fewer training examples, undermining effective pretraining. To address this gap, we propose treating spatial transcriptomics as croppable images. Specifically, we define a multi-channel image representation with fixed spatial size by cropping patches from raw slides, thereby preserving spatial context while substantially increasing the number of training samples. Along the channel dimension, we define gene subset selection rules to control input dimensionality and improve pretraining stability. Extensive experiments show that the proposed image-like dataset construction for ST pretraining consistently improves downstream performance, outperforming conventional pretraining schemes. Ablation studies verify that both spatial patching and channel design are necessary, establishing a unified, practical paradigm for organizing ST data and enabling large-scale pretraining.

研究の動機と目的

既存のスポットベースおよびスライベースのST事前学習スキームが空間コンテキストとサンプルサイズの両立で持つ制限を明確化する。
固定サイズの切り取りを用いたSTデータのパッチベースのマルチチャンネル画像表現を導入して事前学習をスケールさせる。
入力次元性を制御し学習を安定化させる重要度を考慮した遺伝子サブセット（チャンネル）選択を開発する。
パッチベースのST事前学習が複数データセットに渡る下流の空間ドメイン検出と再構成タスクを改善することを示す。

提案手法

各STスライスを切り出し可能な2Dグリッドとして表現し、各スポットの遺伝子発現をマルチチャンネルのベクトルとして形成する。
STグリッドから固定サイズのh x wパッチを切り出し、画像のような訓練単位を作成する。
空間座標をコンパクトな格子に正規化し、訓練サンプルを増やすためにパッチをランダムにサンプルする。
遺伝子の分散を重み付けしたチャンネル選択により各パッチあたり固定数mの遺伝子を選択し、チャンネル次元性を制御する。
マスク付き再構成を目的としたViTバックボーンを用いたマスク付きオートエンコーダをこれらのパッチ上で訓練し、マスクされた遺伝子チャンネルを再構成させる。
下流タスク（空間ドメイン検出、k-NN、MLP分類器）とマスク領域再構成タスクを用いて評価する。

実験結果

リサーチクエスチョン

RQ1STデータをどのように表現すれば空間コンテキストと事前学習のためのサンプル生成をバランスさせられるか。
RQ2パッチベースのマルチチャンネル画像のような事前学習は、スポットベース・スライベースのスキームと比較して下流のSTタスクを改善するか。
RQ3チャンネル（遺伝子）選択は事前学習の安定性と下流の性能にどのような影響を与えるか。
RQ4パッチサイズとチャンネル数が下流の空間ドメイン検出と再構成タスクに与える影響はどの程度か。

主な発見

パッチベースのST事前学習は、複数データセットにわたる空間ドメイン検出タスクで一貫してスポットベースおよびスライベースのスキームを上回る。
スポットベースの事前学習（scGPT）と比較して、提案法は平均的な精度およびARIの顕著な改善をもたらす（例：報告タスクでの平均Acc 0.287、ARI 0.347の向上）。
空間的に拡張されたベースラインscGPT-spatialと比較して、平均的に追加のGainが得られる（Acc約0.059、ARI約0.086の増加程度）。
マスク領域再構成において、提案法はマスクサイズを問わずscGPT-spatialより低いMSE/MAEを達成し、空間的かつ転写的表現学習がより良いことを示す。
アブレーションおよびアブレーション様の分析は、最良の性能には空間パッチングとチャンネル設計の両方が必要であることを示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。