QUICK REVIEW

[论文解读] Bayesian Sparse Global-Local Shrinkage Regression for Grouped Variables

Zemei Xu, Daniel F. Schmidt|arXiv (Cornell University)|Sep 13, 2017

Statistical Methods and Inference被引用 3

一句话总结

该论文提出了一种用于具有重叠和多层结构的分组变量的贝叶斯稀疏全局-局部收缩回归模型，采用连续收缩先验，并将解耦收缩与选择（DSS）框架扩展至支持稀疏分组选择。该研究提出了一种新型自由度估计器，用于稀疏模型，并在识别活跃分组和预测精度方面表现出优于基线的斯拉布-刺猬方法的性能。

ABSTRACT

Most estimates for penalised linear regression can be viewed as posterior modes for an appropriate choice of prior distribution. Bayesian shrinkage methods, particularly the horseshoe estimator, have recently attracted a great deal of attention in the problem of estimating sparse, high-dimensional linear models. This paper extends these ideas, and presents a Bayesian grouped model with continuous global-local shrinkage priors to handle complex group hierarchies that include overlapping and multilevel group structures. As the posterior mean is never a sparse estimate of the linear model coefficients, we extend the recently proposed decoupled shrinkage and selection (DSS) technique to the problem of selecting groups of variables from posterior samples. To choose a final, sparse model, we also adapt generalised information criteria approaches to the DSS framework. To ensure that sparse groups, in which only a few predictors are active, can be effectively identified, we provide an alternative degrees of freedom estimator for sparse Bayesian linear models that takes into account the effects of shrinkage on the model coefficients. Simulations and real data analysis using our proposed method show promising performance in terms of correct identification of active and inactive groups, and prediction, in comparison with a Bayesian grouped slab-and-spike approach.

研究动机与目标

解决在具有复杂分组层次结构（包括重叠和多层结构）的高维线性模型中选择变量分组的挑战。
开发一种贝叶斯框架，利用连续收缩先验实现分组系数的稀疏估计，避免使用离散的刺猬-斯拉布先验。
将解耦收缩与选择（DSS）技术扩展至后验样本，以实现分组选择，确保最终模型的稀疏性。
提出一种新的自由度估计器，用于考虑稀疏贝叶斯线性模型中收缩效应的影响，提升模型选择的准确性。
在模拟数据和真实数据上评估该方法在识别真实活跃分组和实现高预测精度方面的性能。

提出的方法

对分组系数采用连续的全局-局部收缩先验，允许在具有复杂层次结构的分组间实现灵活的收缩。
将解耦收缩与选择（DSS）技术适配至后验样本，以实现收缩与选择的分离，从而支持稀疏分组识别。
提出一种新的自由度估计器，用于考虑收缩对系数方差的影响，从而改进模型复杂度的评估。
在DSS框架内使用广义信息准则（GIC），从后验样本中选择最终的稀疏模型。
将该方法应用于模拟数据和真实世界数据集，以评估分组选择的准确性和预测性能。
使用后验均值进行收缩，使用后验众数进行选择，充分利用后验分布以实现稳健推断。

实验结果

研究问题

RQ1连续的全局-局部收缩先验能否有效处理高维回归中重叠和多层分组结构？
RQ2当应用于贝叶斯分组模型的后验样本时，DSS框架如何提升分组选择的准确性？
RQ3所提出的自由度估计器是否能更准确地衡量稀疏贝叶斯线性模型中的模型复杂度？
RQ4与贝叶斯斯拉布-刺猬方法相比，该方法在识别真实活跃分组和预测结果方面表现如何？
RQ5收缩对系数方差有何影响，如何在自由度估计中正确地加以考虑？

主要发现

所提出的方法在正确识别活跃与非活跃分组方面，优于贝叶斯基于分组的斯拉布-刺猬方法。
该方法在模拟数据和真实数据设置下均表现出优异的预测精度，优于斯拉布-刺猬基准方法。
新型自由度估计器有效捕捉了收缩对模型复杂度的影响，提升了模型选择的可靠性。
基于DSS的选择框架能够成功从未观察到的后验样本中识别出稀疏分组结构，即使在存在重叠和层次分组的情况下亦然。
模拟结果证实，该方法在各种稀疏性和相关性条件下，均能保持较高的真实阳性率和真实阴性率。
真实数据分析表明，该方法在模型选择和预测方面均表现出一致的改进，尤其在具有复杂分组结构的高维场景下。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。