QUICK REVIEW

[论文解读] Bambi: A simple interface for fitting Bayesian linear models in Python

Tomás Capretto, Camen Piho|arXiv (Cornell University)|Dec 19, 2020

Statistical Methods and Bayesian Inference被引用 38

一句话总结

Bambi 提供一个基于 Python、公式驱动的界面，用于在 PyMC 之上拟合贝叶斯广义线性混合模型（GLMM），实现易于指定、拟合、诊断和预测，包括回归、逻辑回归和分层模型。

ABSTRACT

The popularity of Bayesian statistical methods has increased dramatically in recent years across many research areas and industrial applications. This is the result of a variety of methodological advances with faster and cheaper hardware as well as the development of new software tools. Here we introduce an open source Python package named Bambi (BAyesian Model Building Interface) that is built on top of the PyMC probabilistic programming framework and the ArviZ package for exploratory analysis of Bayesian models. Bambi makes it easy to specify complex generalized linear hierarchical models using a formula notation similar to those found in R. We demonstrate Bambi's versatility and ease of use with a few examples spanning a range of common statistical models including multiple regression, logistic regression, and mixed-effects modeling with crossed group specific effects. Additionally we discuss how automatic priors are constructed. Finally, we conclude with a discussion of our plans for the future development of Bambi.

研究动机与目标

通过降低拟合 GLMM 的门槛，促进贝叶斯方法的广泛采用。
提供一个直观的、基于公式的界面，类似于 R 的 lme4，用于指定复杂模型。
与 PyMC 和 ArviZ 集成，实现贝叶斯模型的高效采样、诊断和可视化。
通过示例展示多样性：多元回归、逻辑回归，以及带随机效应的分层模型。
讨论默认先验、推断流程以及 Bambi 包的未来发展。

提出的方法

介绍一个建立在 PyMC 和 ArviZ 之上的用于贝叶斯 GLMM 的 Python 包 (Bambi)。
使用公式接口（类似于 R）通过 Model 类来指定固定效应和随机效应。
使用自适应动态哈密顿蒙特卡洛用于后验采样，并使用多条链进行诊断。
在未指定时提供默认先验，并通过 plot_priors() 机制查看先验。
通过 Family 类支持各种分布族（Gaussian、Bernoulli 等）和连接函数。
通过 .predict() 方法和 ArviZ 可视化启用后验预测检验与预测。

Figure 1: Density estimates based on 5000 samples from the prior distribution for all the regression coefficients. If the user does not explicitly state the priors to be used for the model parameters, Bambi will choose default prior distributions sensible in a wide range of use cases.

实验结果

研究问题

RQ1如何在 Python 中使用类似公式的界面方便地指定贝叶斯广义线性混合模型？
RQ2Bambi 的默认先验是什么，用户如何进行自定义？
RQ3Bambi 如何简化不同分布族（例如 Gaussian、Bernoulli）的模型拟合、诊断和后验预测检验？
RQ4用户能否进行具有交叉随机效应的复杂分层模型并轻松检查模型输出？
RQ5使用后验不确定性进行样本内和样本外预测的工作流是什么？

主要发现

Bambi 以合理的默认值和熟悉的公式语法实现对 GLMM 的快速指定与拟合。
该包利用 PyMC 的自适应动态哈密顿蒙特卡洛从联合后验中采样。
用户可以通过 ArviZ 检视先验、诊断和后验摘要，包括轨迹图和汇总统计。
.predict() 函数允许为新数据生成后验均值预测和后验预测样本。
具有多个随机效应（截距和斜率）的分层模型可以被指定并使用自定义先验拟合。
示例展示回归、逻辑回归和交叉随机效应，体现了易用性和建模灵活性。

Figure 2: HTML representation of an InferenceData object. We can see information is stored into four groups: posterior , log_likelihood , sample_stats , and observed_data. Other groups not shown here are also possible. The posterior group is unfolded showing information like the Dimensions (4 chains

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。