QUICK REVIEW

[论文解读] EDDI: Efficient Dynamic Discovery of High-Value Information with Partial VAE

Chao Ma, Sebastian Tschiatschek|arXiv (Cornell University)|Sep 28, 2018

Machine Learning and Data Classification被引用 49

一句话总结

EDDI 引入一个可扩展框架，使用部分变分自编码器处理部分观测数据，并且使用信息理论获取函数在成本约束下序列性地查询最有价值的缺失变量。

ABSTRACT

Many real-life decision-making situations allow further relevant information to be acquired at a specific cost, for example, in assessing the health status of a patient we may decide to take additional measurements such as diagnostic tests or imaging scans before making a final assessment. Acquiring more relevant information enables better decision making, but may be costly. How can we trade off the desire to make good decisions by acquiring further information with the cost of performing that acquisition? To this end, we propose a principled framework, named EDDI (Efficient Dynamic Discovery of high-value Information), based on the theory of Bayesian experimental design. In EDDI, we propose a novel partial variational autoencoder (Partial VAE) to predict missing data entries problematically given any subset of the observed ones, and combine it with an acquisition function that maximizes expected information gain on a set of target variables. We show cost reduction at the same decision quality and improved decision quality at the same cost in multiple machine learning benchmarks and two real-world health-care applications.

研究动机与目标

在对成本敏感的环境中，推动自动化、个性化的动态信息获取。
开发一个可扩展的部分观测数据的概率模型，支持快速推断。
设计一个获取函数，用于选择下一步要查询的最具信息量的缺失变量。
证明 EDDI 在各领域降低信息获取成本的同时不牺牲决策质量。

提出的方法

引入部分 VAE，以对任意观察子集变量执行摊销推断。
使用置换不变的集合编码器（PN/PNP）表示 xO，以建模 p(z|xO)。
推导基于 z 空间互信息的可处理信息奖励，用于变量选择（Equation 9）。
通过 q(z|xO)、q(z|xi, xO) 及共享样本来近似 KL 项，从而实现高效计算。
将主动变量选择表述为最大化关于目标变量 xφ 的期望信息增益（Algorithm 1）。

实验结果

研究问题

RQ1在每个实例仅观察到子集变量的情况下，如何进行概率推断？
RQ2我们能否设计一个可扩展的、按变量进行的获取策略，在获取成本下最大化信息增益？
RQ3部分 VAE 是否能够在跨任务中实现有效的缺失数据插补和不确定性估计？
RQ4EDDI 方法在现实世界医疗保健和大规模数据集上的计算效率是否足够？

主要发现

部分 VAE 提供对部分观测数据的可扩展摊销推断，并支持有效的插补。
PN/PNP 编码在 MNIST 实验中比基于 ZI 的方法在修复和不确定性建模方面表现更好。
EDDI 在六个 UCI 数据集上在信息效率和 RMSE AUIC 排名方面优于 RAND 和 SING 基线。
基于 PNP 的 EDDI 相比非摊销方法实现显著的加速，在波士顿房价数据集上比 DRAL 高效约 1000 倍。
在 MIMIC-III 风险评估和 NHANES 公共卫生任务中，使用 PNP 的 EDDI 一致获得比基线更好的 AUIC 排名。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。