QUICK REVIEW

[论文解读] Decision-Making with Auto-Encoding Variational Bayes

Romain Lopez, Pierre Boyeau|arXiv (Cornell University)|Feb 17, 2020

Advanced Multi-Objective Optimization Algorithms参考文献 55被引用 23

一句话总结

本文提出了一种用于变分自编码器（VAEs）的三步决策框架，将模型拟合与后验近似解耦：首先，使用如IWELBO或$χ^2$-VAE等目标训练生成模型；其次，通过不同的推理目标学习多个不同的近似后验；第三，利用多重重要性采样（MIS）将它们组合起来，以实现稳健的决策。该方法在单细胞RNA测序中显著提升了下游性能，在差异表达检测方面优于当前最先进方法。

ABSTRACT

To make decisions based on a model fit with auto-encoding variational Bayes (AEVB), practitioners often let the variational distribution serve as a surrogate for the posterior distribution. This approach yields biased estimates of the expected risk, and therefore leads to poor decisions for two reasons. First, the model fit with AEVB may not equal the underlying data distribution. Second, the variational distribution may not equal the posterior distribution under the fitted model. We explore how fitting the variational distribution based on several objective functions other than the ELBO, while continuing to fit the generative model based on the ELBO, affects the quality of downstream decisions. For the probabilistic principal component analysis model, we investigate how importance sampling error, as well as the bias of the model parameter estimates, varies across several approximate posteriors when used as proposal distributions. Our theoretical results suggest that a posterior approximation distinct from the variational distribution should be used for making decisions. Motivated by these theoretical results, we propose learning several approximate proposals for the best model and combining them using multiple importance sampling for decision-making. In addition to toy examples, we present a full-fledged case study of single-cell RNA sequencing. In this challenging instance of multiple hypothesis testing, our proposed approach surpasses the current state of the art.

研究动机与目标

为解决基于AEVB的模型中使用变分后验作为真实后验替代品时导致的决策质量低下问题。
探究除ELBO外的其他推理目标是否能为决策中的重要性采样提供更优的提议分布。
开发一种将模型拟合与后验近似解耦的方法，以减少下游决策中的偏差与方差。
在涉及多重假设检验的真实世界单细胞RNA测序案例研究中评估该方法。
证明通过MIS组合多个近似后验，可获得比标准VAE推理更可靠、更准确的决策。

提出的方法

使用非ELBO目标（如IWELBO、WW（前向KL）或$χ^2$-VAE）训练生成模型，以改善模型对数据生成过程的拟合。
在模型固定后，使用不同的推理目标（如ELBO、前向KL、$χ^2$-散度）训练多个不同的变分后验，以生成多样化的提议分布。
通过多重重要性采样（MIS）将多个近似后联合成单一提议分布，以减少偏差与方差，估计后验期望。
根据贝叶斯决策理论，利用所得的MIS后验期望计算最小化真实后验下期望损失的决策。
将该框架应用于全尺度的单细胞RNA-seq数据集，进行差异表达检测，以后验期望FDR作为决策指标。
使用PSIS诊断与PRAUC评估性能，比较不同推理配置下基因排序的稳定性和准确性。

实验结果

研究问题

RQ1能否通过使用替代推理目标（而非标准ELBO）的变分后验，改善VAE中的决策质量？
RQ2通过多重重要性采样组合多个近似后验，是否能获得比使用单一后验更准确、更稳健的决策？
RQ3不同的推理目标（如前向KL、$χ^2$-散度）如何影响决策中使用的后验近似质量？
RQ4由于ELBO优化导致的模型误设，在真实应用场景（如单细胞基因组学）中在多大程度上会降低下游决策性能？
RQ5所提出的框架能否在多重假设检验任务（如scRNA-seq中的差异表达检测）中超越最先进方法？

主要发现

通过ELBO训练的经典VAE后验在重要性采样中作为提议分布表现不佳，因其低估了方差，导致PSIS诊断值偏高，FDR估计不可靠。
使用IWAE或$χ^2$-VAE目标训练的模型展现出更优的PSIS值和更可靠的后验期望FDR估计，表明提议质量更高。
通过MIS组合多个后验近似，显著改善了FDR控制与AUC（PRAUC = 0.94），在差异表达检测中优于标准VAE与IWAE基线。
该框架在真实世界单细胞RNA-seq数据集中表现优异，展现出对数值不稳定的鲁棒性，并提升了基因排序的准确性。
研究发现，后验推理目标的选择对决策质量的影响，远超过模型目标选择的影响，凸显了模型与推理优化分离的重要性。
即使模型拟合良好（IWELBO值高），若仍使用标准ELBO后验，仍会导致误导性基因排序，凸显了为决策专门设计后验近似的必要性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。