QUICK REVIEW

[論文レビュー] Estimating Fold Changes from Partially Observed Outcomes with Applications in Microbial Metagenomics

David S. Clausen, Amy D. Willis|arXiv (Cornell University)|Feb 7, 2024

Metabolomics and Mass Spectrometry Studies被引用数 9

ひとこと要約

この論文は、部分的に観測された出力しか得られない場合に、複数の分類群の平均豊度のフォールドチェンジを推定する方法を開発し、微生物メタゲノミクスにおけるサンプル別・カテゴリ別の撹乱に対処します。識別性を制約によって確保し、ペナルティ付き推定法、ロバストな仮説検定、および大腸がんのメタ解析への適用を提供します。

ABSTRACT

We consider the problem of estimating fold-changes in the expected value of a multivariate outcome observed with unknown sample-specific and category-specific perturbations. This challenge arises in high-throughput sequencing studies of the abundance of microbial taxa because microbes are systematically over- and under-detected relative to their true abundances. Our model admits a partially identifiable estimand, and we establish full identifiability by imposing interpretable parameter constraints. To reduce bias and guarantee the existence of estimators in the presence of sparse observations, we apply an asymptotically negligible and constraint-invariant penalty to our estimating function. We develop a fast coordinate descent algorithm for estimation, and an augmented Lagrangian algorithm for estimation under null hypotheses. We construct a model-robust score test and demonstrate valid inference even for small sample sizes and violated distributional assumptions. The flexibility of the approach and comparisons to related methods are illustrated through a meta-analysis of microbial associations with colorectal cancer.

研究の動機と目的

サンプル特異的およびカテゴリ特異的な撹乱の下で観測される非負の多変量アウトカムの平均のフォールド差を推定する問題を動機づけ、形式化する。
パラメータ制約による同定性を確立し、真の存在量のフォールド差を解釈可能な推定量として定義する。
Firth型のバイアス削減を伴う高速推定アルゴリズムと、制約主導の同定性アプローチを提案する。
小サンプルや分布誤指定の下での性能を有する、モデルに対して頑健な推定・推論手法を開発する。
シミュレーションと大腸がん関連の腸内細菌相のメタ解析を通じて手法を実証する。

提案手法

観測されていない真の豊度の対数線形モデルと、未知のサンプル特異および分類群特異効果を含む撹乱され、部分的に観測されたバージョンを指定する。
部分同定性を確立し、パラメータの同値類を定義する。フォールド差を同定するために、滑らかな同定性制約（pseudo-Huber）を課す。
分離とスパース性の下で有限推定量を保証するFirth型ペナルティを用いたペナルイズド尤度を用い、座標降下法と増補データ技術で解く。
同定性制約の下で対数フォールド差の仮説に対するモデル頑健なスコア検定を導出し、堅牢なスコア統計量と代替の堅牢なWald検定を提供する。
帰無仮説の下で制約付き推定を行うための拡張ラグランジアン最適化フレームワークを提供し、制約選択に対するペナルイズド尤度の不変性を保証する。

Figure 1 : Because the mean function is only partially identifiable, true effect sizes $\beta$ cannot be estimated on an “absolute” scale (left). Full identifiability is established via a constraint function, allowing us to estimate the log-fold differences in true abundances across groups relative

実験結果

リサーチクエスチョン

RQ1観測が未知のサンプル特異的およびカテゴリ特異的効果によって歪められるとき、真の豊度の平均のフォールド差をどう推定できるか？
RQ2微生物叢データにおいてそのフォールド差を有意味に解釈するために必要な同定性制約は何か？
RQ3小さなサンプルや分布の誤指定の可能性がある中で、速く、バイアスを低減する推定手法と頑健な推論法を開発できるか？
RQ4提案手法は Poisson および zero-inflated negative binomial 設定でのシミュレーションおよび実際の大腸がん腸内細菌叢メタ解析でどのように性能を示すか？

主な発見

手法は対数平均パラメータの同値類を識別し、係数行列の行に制約を課すことで完全な同定性を達成する。
座標降下法を備えたFirthペナルティ付きプロファイル尤度は、観測が希薄で分離の可能性がある場合でも安定した推定を提供する。
堅牢なスコア検定は大きなサンプルで第一種の誤りを良く抑制し、小サンプルでは保守的になり得る一方、堅牢なWald検定は小サンプルで過度に保守的または反動的になる可能性がある。
Poissonデータ下での検出力はZINBより高く、サンプルサイズと、分類群数が多いほど増加する傾向がある。
Wirbelらのデータを用いた大腸がんメタ解析では、堅牢なスコア検定を用いた0.1のFDR閾値でCRCステータスに関連する差異を持つ30分類群が同定された。
このアプローチは、分類群間の通常の差異に対して、共変量レベルごとの真の豊度の対数フォールド差を解釈可能に提供する。

Figure 2 : Q-Q plots comparing empirical quantiles (y-axis) of the robust score (dark blue) and robust Wald (light blue) p-values to theoretical quantiles (x-axis). A conservative test will produce a curve above the line $x=y$ for small p-values and an anti-conservative test will produce a curve bel

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。