QUICK REVIEW

[论文解读] Statistical Methods for cis-Mendelian Randomization with Two-sample Summary-level Data

Apostolos Gkatzionis, Stephen Burgess|arXiv (Cornell University)|Jan 11, 2021

Genetic Associations and Epidemiology被引用 9

一句话总结

本文提出并评估了基于两样本汇总数据的顺式孟德尔随机化统计方法，聚焦于从单一基因区域中选择和分析相关遗传变异。结果表明，在弱工具变量条件下，因子分析和贝叶斯变量选择优于简单修剪法，可为药物靶点验证研究提供更可靠的因果推断。

ABSTRACT

Mendelian randomization is the use of genetic variants to assess the existence of a causal relationship between a risk factor and an outcome of interest. Here, we focus on two-sample summary-data Mendelian randomization analyses with many correlated variants from a single gene region, and particularly on cis-Mendelian randomization studies which use protein expression as a risk factor. Such studies must rely on a small, curated set of variants from the studied region; using all variants in the region requires inverting an ill-conditioned genetic correlation matrix and results in numerically unstable causal effect estimates. We review methods for variable selection and estimation in cis-Mendelian randomization with summary-level data, ranging from stepwise pruning and conditional analysis to principal components analysis, factor analysis and Bayesian variable selection. In a simulation study, we show that the various methods have a comparable performance in analyses with large sample sizes and strong genetic instruments. However, when weak instrument bias is suspected, factor analysis and Bayesian variable selection produce more reliable inferences than simple pruning approaches, which are often used in practice. We conclude by examining two case studies, assessing the effects of LDL-cholesterol and serum testosterone on coronary heart disease risk using variants in the HMGCR and SHBG gene regions respectively.

研究动机与目标

解决在使用来自单一基因区域的大量相关遗传变异时，顺式孟德尔随机化所面临的挑战。
比较在不同工具变量强度和相关结构下，变量选择与估计方法的性能表现。
评估在遗传工具变量高度相关时，两样本汇总数据孟德尔随机化中因果效应估计的可靠性。
为应用研究人员在顺式MR中选择方法，特别是药物靶点发现场景，提供实际指导。

提出的方法

在顺式MR中采用连锁不平衡修剪（LD-pruning）、条件分析、主成分分析（PCA）、因子分析以及贝叶斯随机搜索变量选择（JAM）进行变异选择。
应用逆方差加权（IVW）、加权中位数、基于众数的估计以及有限信息最大似然（LIML）估计器进行因果效应估计。
通过模拟研究，基于两个基因区域（HMGCR 和 SHBG）比较不同方法在不同工具变量强度和相关性下的性能表现。
采用多种方法进行敏感性分析，以评估稳健性，特别是在弱工具变量偏倚情况下的表现。
将方法应用于英国生物银行（UK Biobank）和CARDIoGRAMplusC4D的真实数据，研究低密度脂蛋白胆固醇和睾酮对冠心病的影响。
提出将变量选择与工具变量估计分离至独立样本，以减轻赢家诅咒和选择偏倚。

实验结果

研究问题

RQ1在强工具变量与弱工具变量条件下，不同变量选择方法（如修剪法 vs. PCA vs. 因子分析）在顺式MR中的表现如何？
RQ2当工具变量相关且较弱时，哪些估计方法（如IVW、LIML、JAM）能产生最可靠的因果效应估计值和置信区间？
RQ3变量选择方法的选择在多大程度上影响了使用汇总数据的顺式MR中因果推断的有效性？
RQ4在弱工具变量存在的情况下，因子分析和贝叶斯变量选择是否相比标准修剪方法能减少偏倚？
RQ5方法选择如何影响真实世界应用中（如药物靶点验证）顺式MR结果的可靠性？

主要发现

在存在弱工具变量偏倚时，因子分析和贝叶斯变量选择（JAM）产生的推断比简单修剪法更可靠。
在强工具变量条件下，所有方法（包括基于PCA的IVW）表现相近；但基于PCA的IVW在弱工具变量下表现出更大的偏倚。
F-LIML能提供准确的因果效应估计，但在弱工具变量下置信区间覆盖性能较差。
CLR检验受弱工具变量偏倚影响最小，能为因果零假设提供有效推断，尽管不产生点估计。
JAM存在因果估计偏倚，但其不确定性量化优于F-LIML，表明稳健性与精度之间存在权衡。
本研究建议在顺式MR应用中采用多种方法（包括F-LIML、CLR和JAM）作为敏感性分析，以增强结果的可靠性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。