QUICK REVIEW

[论文解读] Context-aware Skin Cancer Epithelial Cell Classification with Scalable Graph Transformers

Lucas Sancéré, Noémie Moreau|arXiv (Cornell University)|Feb 17, 2026

Cutaneous Melanoma Detection and Management被引用 0

一句话总结

本论文在全 WSIs 的细胞图上演示可扩展的图形变换器，用于区分皮肤鳞状细胞癌中的健康与肿瘤上皮细胞，优于基于补丁的图像方法并提供更快的训练时间。

ABSTRACT

Whole-slide images (WSIs) from cancer patients contain rich information that can be used for medical diagnosis or to follow treatment progress. To automate their analysis, numerous deep learning methods based on convolutional neural networks and Vision Transformers have been developed and have achieved strong performance in segmentation and classification tasks. However, due to the large size and complex cellular organization of WSIs, these models rely on patch-based representations, losing vital tissue-level context. We propose using scalable Graph Transformers on a full-WSI cell graph for classification. We evaluate this methodology on a challenging task: the classification of healthy versus tumor epithelial cells in cutaneous squamous cell carcinoma (cSCC), where both cell types exhibit very similar morphologies and are therefore difficult to differentiate for image-based approaches. We first compared image-based and graph-based methods on a single WSI. Graph Transformer models SGFormer and DIFFormer achieved balanced accuracies of $85.2 \pm 1.5$ ($\pm$ standard error) and $85.1 \pm 2.5$ in 3-fold cross-validation, respectively, whereas the best image-based method reached $81.2 \pm 3.0$. By evaluating several node feature configurations, we found that the most informative representation combined morphological and texture features as well as the cell classes of non-epithelial cells, highlighting the importance of the surrounding cellular context. We then extended our work to train on several WSIs from several patients. To address the computational constraints of image-based models, we extracted four $2560 imes 2560$ pixel patches from each image and converted them into graphs. In this setting, DIFFormer achieved a balanced accuracy of $83.6 \pm 1.9$ (3-fold cross-validation), while the state-of-the-art image-based model CellViT256 reached $78.1 \pm 0.5$.

研究动机与目标

解决分析过程中丢失组织级上下文的补丁式 WSI 限制。
提出一个全 WSIs 的细胞图表示与可扩展的图形变换器来分类上皮细胞。
在 WSI-Graph 和 TILE-Graphs 数据集上系统性地将基于图的方法与基于图像的方法进行比较。
研究节点特征选择和图简化对分类性能的影响。

提出的方法

构建一个 WSI 级别的细胞图，节点为形态、纹理和类别特征的细胞核，边连接阈值距离内的相邻细胞核。
利用专家肿瘤注释来细化上皮细胞标签，创建肿瘤与健康上皮节点类别。
通过保留位于上皮锚点周围的 k 次跳跃内的节点来简化图，以在上下文和计算效率之间取得平衡。
评估线性复杂度的图形变换器（SGFormer、NodeFormer、DIFFormer）用于二分类节点（肿瘤 vs 健康），对目标类别特征进行掩蔽以保留上下文信息。
在 WSI-Graph 和 TILE-Graphs 上对比基于图的模型与基于图像的基线（CellViT256），在 3 折交叉验证、无早停的条件下进行评估。
使用 Adam 在大型 GPU 上训练，并从先前基准中迁移超参数以适应每个模型；采用子图评估和随机节点评估两种协议来评估泛化性能。

实验结果

研究问题

RQ1全 WSIs 的细胞图表示是否能比基于补丁的图像方法更好地区分健康与肿瘤上皮细胞？
RQ2哪些节点特征（形态、纹理、细胞类别）及归一化策略最能提升上皮细胞分类？
RQ3具线性复杂度的可扩展图形变换器相比传统 GNN 和基于图像的模型在 WSI-Graph 和 TILE-Graphs 数据集上的表现如何？
RQ4图简化（最大跳数）对不同评估协议下的分类准确性和鲁棒性有何影响？

主要发现

在单张 WSI 上，SGFormer 实现了 85.2 ± 1.5 的平衡准确度，DIFFormer 为 85.1 ± 2.5，超出最佳基于图像的方法 81.2 ± 3.0。
在跨多名患者的一组 TILE-Graphs 上，DIFFormer 达到 83.6 ± 1.9 的平衡准确度，而 CellViT256 为 78.1 ± 0.5。
基于图的训练显著更快，DIFFormer 的每折训练大约 32 分钟，而 CellViT256 约需要 5 天。
节点特征消融显示，形态、纹理和细胞类别特征结合 z-score 归一化能获得最佳泛化（例如：形态+纹理+细胞类别+归一化在子图中为 84.0 ± 2.8；在随机节点中为 94.5 ± 0.4；表格数据引用）。
图简化（10 次最大跳数）在连通性和性能之间取得平衡（子图 86.6 ± 2.2；随机节点 95.0 ± 0.2）。
总体而言，基于图的方法（DIFFormer、SGFormer）在同一患者内和跨多名患者的数据集上均能优于基于图像的方法，同时在计算效率方面也具有显著优势。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。