QUICK REVIEW

[论文解读] PRISM: A Multi-Modal Generative Foundation Model for Slide-Level Histopathology

George Shaikovski, Adam Casson|arXiv (Cornell University)|May 16, 2024

AI in cancer detection被引用 12

一句话总结

PRISM 是一个基于幻灯片级别、多模态的基础模型，用于H&E病理组织学，使用临床报告进行预训练；它实现零样本癌症检测/亚型分型、标注高效的生物标志物预测，以及从整份幻灯片图像生成可解释报告。

ABSTRACT

Foundation models in computational pathology promise to unlock the development of new clinical decision support systems and models for precision medicine. However, there is a mismatch between most clinical analysis, which is defined at the level of one or more whole slide images, and foundation models to date, which process the thousands of image tiles contained in a whole slide image separately. The requirement to train a network to aggregate information across a large number of tiles in multiple whole slide images limits these models' impact. In this work, we present a slide-level foundation model for H&E-stained histopathology, PRISM, that builds on Virchow tile embeddings and leverages clinical report text for pre-training. Using the tile embeddings, PRISM produces slide-level embeddings with the ability to generate clinical reports, resulting in several modes of use. Using text prompts, PRISM achieves zero-shot cancer detection and sub-typing performance approaching and surpassing that of a supervised aggregator model. Using the slide embeddings with linear classifiers, PRISM surpasses supervised aggregator models. Furthermore, we demonstrate that fine-tuning of the PRISM slide encoder yields label-efficient training for biomarker prediction, a task that typically suffers from low availability of training data; an aggregator initialized with PRISM and trained on as little as 10% of the training data can outperform a supervised baseline that uses all of the data.

研究动机与目标

在计算病理学中推动幻灯片级基础模型的研究，以克服WSI中大量小片块聚合带来的挑战。
开发 PRISM，将 Virchow 小片嵌入与临床文本融合，形成幻灯片级表征并生成报告。
展示零样本、线性探针和微调能力，用于癌症检测、癌症亚型分型和生物标志物预测。
证明文本引导的预训练在性能和数据效率方面优于完全监督的聚合器。
探索生成的病理报告及小片级注意力的可解释性。

提出的方法

预训练一个幻灯片级编码器（Perceiver），将 Virchow 小片嵌入聚合为一个幻灯片嵌入。
将 BioGPT 语言模型用作解码器，通过对比损失将幻灯片嵌入与改写后的临床报告对齐。
使用类似 CoCa 的目标，结合对比对齐和自回归报告生成（教师强制）。
通过提取诊断并使用 GPT-4 重写以创建简明摘要，对临床报告进行预处理。
在下游任务上使用线性分类器或端到端微调对幻灯片编码器进行微调。
在癌症检测、癌症亚型分型和生物标志物预测任务上评估零样本、线性探针和微调能力。

实验结果

研究问题

RQ1用临床报告进行训练的幻灯片级基础模型是否能够在没有任务特定监督的情况下实现零样本癌症检测和亚型分型？
RQ2将小片嵌入聚合为幻灯片级表示是否在下游任务中比逐片级或从头训练的聚合器有更好表现？
RQ3以临床报告监督进行预训练如何影响生物标志物（MSK-IMPACT）预测的数据效率？
RQ4生成的临床报告是否具有可解释性并与幻灯片中的组织病理特征对齐？

主要发现

PRISM 在若干任务上实现了接近或超过监督聚合器的零样本癌症检测与亚型分型性能。
使用预训练幻灯片嵌入的线性探针在癌症亚型分型和检测任务上超过完全监督基线。
微调预训练幻灯片编码器在低数据量生物标志物预测中表现出色，在训练数据有限时（例如仅使用数据的10-50%）优于基线。
用临床报告进行预训练提高数据效率，并降低若干生物标志物在不同运行之间的性能波动。
生成的报告在定性上显示出关注的片块与病理学家文本中提到的组织病理特征之间的一致性。
PRISM 支持可解释的幻灯片级嵌入和基于文本的报告生成，无需任务特定监督。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。