QUICK REVIEW

[论文解读] AutoPrognosis: Automated Clinical Prognostic Modeling via Bayesian Optimization with Structured Kernel Learning

Ahmed M. Alaa, Mihaela van der Schaar|arXiv (Cornell University)|Feb 20, 2018

Machine Learning and Data Classification参考文献 15被引用 53

一句话总结

AutoPrognosis 自动化临床数据预测性机器学习管道设计，使用带结构内核分解的分批贝叶斯优化，在不同心血管队列中实现集成与可解释的预测。

ABSTRACT

Clinical prognostic models derived from largescale healthcare data can inform critical diagnostic and therapeutic decisions. To enable off-theshelf usage of machine learning (ML) in prognostic research, we developed AUTOPROGNOSIS: a system for automating the design of predictive modeling pipelines tailored for clinical prognosis. AUTOPROGNOSIS optimizes ensembles of pipeline configurations efficiently using a novel batched Bayesian optimization (BO) algorithm that learns a low-dimensional decomposition of the pipelines high-dimensional hyperparameter space in concurrence with the BO procedure. This is achieved by modeling the pipelines performances as a black-box function with a Gaussian process prior, and modeling the similarities between the pipelines baseline algorithms via a sparse additive kernel with a Dirichlet prior. Meta-learning is used to warmstart BO with external data from similar patient cohorts by calibrating the priors using an algorithm that mimics the empirical Bayes method. The system automatically explains its predictions by presenting the clinicians with logical association rules that link patients features to predicted risk strata. We demonstrate the utility of AUTOPROGNOSIS using 10 major patient cohorts representing various aspects of cardiovascular patient care.

研究动机与目标

自动化设计覆盖临床数据的预测性机器学习管道，包括插补、特征处理、预测与校准。
学习一个低维的、结构化的内核分解，以在高维管道空间中实现高效的高斯过程贝叶斯优化。
结合元学习，从外部队列对贝叶斯优化进行暖启动，并向临床医生提供可解释的基于规则的解释。

提出的方法

将管道配置建模为一个4阶段管道（插补、特征处理、预测、校准），共可得4,800种管道。
使用基于高斯过程的分批贝叶斯优化，将管道性能作为黑箱函数进行优化。
学习稀疏加性内核分解以捕获算法之间的相关性，对子空间分配设置Dirichlet先验，并使用吉布斯采样进行更新。
通过经验贝叶斯元学习利用外部队列及队列元特征来对GP先验进行校准，从而对BO进行暖启动。
后验贝叶斯模型平均用于从已评估的管道构建集成。
提供一个解释器模块，输出将特征与风险分层联系起来的贝叶斯关联规则。
在10个心血管队列上进行评估，比较临床评分和AutoML基线。

实验结果

研究问题

RQ1是否存在一种结构化的、加性内核GP先验，能够在临床预测建模的高维AutoML管道空间中实现可扩展的贝叶斯优化？
RQ2AutoPrognosis在跨越不同患者队列时，是否在预测性能上优于标准临床风险评分与现有AutoML框架？
RQ3通过经验贝叶斯的元学习，能否使用相似患者的先前数据对新队列的BO进行有效暖启动？
RQ4集成与解释器组件是否提供稳健的预测性能和对临床有意义的解释？

主要发现

AutoPrognosis 在10个心血管队列的AUC-ROC评估中优于临床风险评分和AutoML基线。
学习到的内核分解将算法聚成同一子空间时具有类似的性能，例如基于树的插补与预测方法被归入同一子空间。
事后贝叶斯模型平均利用所有探索过的管道，在小样本场景中提高鲁棒性。
通过经验贝叶斯标定的元学习提高了BO的收敛性与新队列先验信息的有效性。
一个解释器模块生成将患者特征与风险分层联系起来的贝叶斯关联规则，提供临床可操作的解释。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。