QUICK REVIEW

[论文解读] Tutorial: a priori estimation of sample size, effect size, and statistical power for cluster analysis, latent class analysis, and multivariate mixture models

Edwin S. Dalmaijer|arXiv (Cornell University)|Sep 2, 2023

Bayesian Methods and Mixture Models被引用 8

一句话总结

本教程提供先验方法来估算用于识别子组的技术的样本量、效应量和效能，以及一个用于常见方法的基于模拟的参考表。

ABSTRACT

Before embarking on data collection, researchers typically compute how many individual observations they should do. This is vital for doing studies with sufficient statistical power, and often a cornerstone in study pre-registrations and grant applications. For traditional statistical tests, one would typically determine an acceptable level of statistical power, (gu)estimate effect size, and then use both values to compute the required sample size. However, for analyses that identify subgroups, statistical power is harder to establish. Once sample size reaches a sufficient threshold, effect size is primarily determined by the number of measured features and the underlying subgroup separation. As a consequence, a priory computations of statistical power are notoriously complex. In this tutorial, I will provide a roadmap to determining sample size and effect size for analyses that identify subgroups. First, I introduce a procedure that allows researchers to formalise their expectations about effect sizes in their domain of choice, and use this to compute the minimally required number of measured variables. Next, I outline how to establish the minimum sample size in subgroup analyses. Finally, I use simulations to provide a reference table for the most popular subgroup analyses: k-means, Ward agglomerative hierarchical clustering, c-means fuzzy clustering, latent class analysis, latent profile analysis, and Gaussian mixture modelling. The table shows the minimum numbers of observations per expected subgroup (sample size) and features (measured variables) to achieve acceptable statistical power, and can be readily used in study design.

研究动机与目标

将领域特定效应量形式化，以计算所需测量变量的最小数量。
概述如何在子组分析中确立最小样本量。
提供一个基于模拟的参考表，用于常见子组分析以辅助研究设计。

提出的方法

提出一种将领域特定效应量形式化并计算最小特征数的程序。
概述确定子组分析最小样本量的步骤。
使用模拟生成一个针对 k-means、Ward 分组聚类、模糊聚类 c-means、潜在类别分析、潜在轮廓分析以及高斯混合模型的参考表。

实验结果

研究问题

RQ1如何将领域特定效应量形式化，以为识别子组的分析提供所需特征数量的信息？
RQ2在常见聚类和混合模型中，可靠的子组分析所需的最小样本量是多少？
RQ3要达到对三种及以上子组识别方法的可接受功效，需要每个子组的观测值和特征的最少数量是多少？

主要发现

将预计效应量转化为测量变量最小数量的正式程序。
在子组分析中计算最小样本量的框架。
一个基于模拟的流行子组分析（k-means、Ward、c-means、LCA、LPA、高斯混合模型）的参考表。
表格显示实现可接受统计功效所需的每个预期子组的最小观测值和特征数量。
该表对研究设计具有直接可用性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。