QUICK REVIEW

[论文解读] Bayesian Dirichlet Bayesian Network Scores and the Maximum Entropy Principle

Marco Scutari|arXiv (Cornell University)|Aug 2, 2017

Bayesian Modeling and Causal Inference被引用 1

一句话总结

本文批判了贝叶斯网络结构学习中使用的贝叶斯狄里克雷等价均匀（BDeu）评分，证明其违反了最大相对熵原理，并因超参数敏感性导致贝叶斯因子不可靠。本文主张采用贝叶斯狄里克雷稀疏（BDs）评分，该评分通过更合理的先验假设，更好地符合熵原理，从而避免BDeu的缺陷，尤其在稀疏数据场景下表现更优。

ABSTRACT

A classic approach for learning Bayesian networks from data is to identify a maximum a posteriori (MAP) network structure. In the case of discrete Bayesian networks, MAP networks are selected by maximising one of several possible Bayesian Dirichlet (BD) scores; the most famous is the Bayesian Dirichlet equivalent uniform (BDeu) score from Heckerman et al (1995). The key properties of BDeu arise from its uniform prior over the parameters of each local distribution in the network, which makes structure learning computationally efficient; it does not require the elicitation of prior knowledge from experts; and it satisfies score equivalence. In this paper we will review the derivation and the properties of BD scores, and of BDeu in particular, and we will link them to the corresponding entropy estimates to study them from an information theoretic perspective. To this end, we will work in the context of the foundational work of Giffin and Caticha (2007), who showed that Bayesian inference can be framed as a particular case of the maximum relative entropy principle. We will use this connection to show that BDeu should not be used for structure learning from sparse data, since it violates the maximum relative entropy principle; and that it is also problematic from a more classic Bayesian model selection perspective, because it produces Bayes factors that are sensitive to the value of its only hyperparameter. Using a large simulation study, we found in our previous work (Scutari, 2016) that the Bayesian Dirichlet sparse (BDs) score seems to provide better accuracy in structure learning; in this paper we further show that BDs does not suffer from the issues above, and we recommend to use it for sparse data instead of BDeu. Finally, will show that these issues are in fact different aspects of the same problem and a consequence of the distributional assumptions of the prior.

研究动机与目标

从信息论视角，考察贝叶斯狄里克雷（BD）评分，特别是BDeu的理论基础。
通过将其与最大相对熵原理关联，探究BDeu在稀疏数据场景下表现不佳的原因。
识别BDeu先验分布中的核心问题，该问题导致结构学习不可靠及贝叶斯因子敏感。
证明贝叶斯狄里克雷稀疏（BDs）评分可避免上述问题，并在稀疏数据设置下表现更优。
将BDeu中观察到的问题统一归因于一个根本原因：先验中的分布假设存在缺陷。

提出的方法

采用Giffin和Caticha（2007）的框架，将贝叶斯推断与最大相对熵原理联系起来，分析BD评分。
推导BD评分与熵估计之间的联系，表明BDeu的均匀先验导致熵估计不一致。
分析BDeu的贝叶斯因子对其单一超参数的敏感性，揭示其在模型选择中的不稳定性。
通过理论分析和先前研究（Scutari, 2016）的模拟证据，比较BDeu与BDs。
通过证明BDs满足熵约束并避免BDeu的缺陷，推荐其作为更合理的替代方案。
确立核心问题源于先验的分布假设，而非评分计算本身。

实验结果

研究问题

RQ1在稀疏数据情形下，BDeu评分是否满足最大相对熵原理？
RQ2BDeu的贝叶斯因子对其超参数的敏感性如何影响结构学习的可靠性？
RQ3从信息论角度，为何BDeu在稀疏数据设置下会失效？
RQ4能否在理论和实证层面证明BDs评分优于BDeu？
RQ5BDeu的失效是否源于与其先验分布假设相关的单一根本原因？

主要发现

由于其在局部分布上采用均匀先验，BDeu评分在稀疏数据中违反了最大相对熵原理。
BDeu产生的贝叶斯因子对其超参数高度敏感，损害了模型选择的可靠性。
BDs评分不存在这些问题，因其更符合熵原理，并采用更适合稀疏数据的先验假设。
BDeu的问题源于其先验的分布假设，导致熵估计不一致。
Scutari（2016）的模拟证据表明，在数据稀疏条件下，BDs在结构学习中比BDeu更准确。
BDeu的所有缺陷均可统一归因于一个原因：其先验与熵最大化原理不相容。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。