QUICK REVIEW

[논문 리뷰] Tutorial: a priori estimation of sample size, effect size, and statistical power for cluster analysis, latent class analysis, and multivariate mixture models

Edwin S. Dalmaijer|arXiv (Cornell University)|2023. 09. 02.

Bayesian Methods and Mixture Models인용 수 8

한 줄 요약

이 튜토리얼은 하위 그룹을 식별하는 기법의 샘플 크기, 효과 크기, 검정력 추정에 대한 a priori 방법을 제공하며 일반 방법에 대한 시뮬레이션 기반 참조 표를 포함한다.

ABSTRACT

Before embarking on data collection, researchers typically compute how many individual observations they should do. This is vital for doing studies with sufficient statistical power, and often a cornerstone in study pre-registrations and grant applications. For traditional statistical tests, one would typically determine an acceptable level of statistical power, (gu)estimate effect size, and then use both values to compute the required sample size. However, for analyses that identify subgroups, statistical power is harder to establish. Once sample size reaches a sufficient threshold, effect size is primarily determined by the number of measured features and the underlying subgroup separation. As a consequence, a priory computations of statistical power are notoriously complex. In this tutorial, I will provide a roadmap to determining sample size and effect size for analyses that identify subgroups. First, I introduce a procedure that allows researchers to formalise their expectations about effect sizes in their domain of choice, and use this to compute the minimally required number of measured variables. Next, I outline how to establish the minimum sample size in subgroup analyses. Finally, I use simulations to provide a reference table for the most popular subgroup analyses: k-means, Ward agglomerative hierarchical clustering, c-means fuzzy clustering, latent class analysis, latent profile analysis, and Gaussian mixture modelling. The table shows the minimum numbers of observations per expected subgroup (sample size) and features (measured variables) to achieve acceptable statistical power, and can be readily used in study design.

연구 동기 및 목표

도메인 특화 효과 크기를 형식화하여 최소한으로 필요한 측정 변수 수를 계산한다.
하위군 분석에서 최소 샘플 크기를 설정하는 방법을 개략화한다.
연구 설계를 돕기 위한 일반적인 하위군 분석에 대한 시뮬레이션 기반 참조 표를 제공한다.

제안 방법

도메인 특화 효과 크기를 형식화하고 최소 특징 수를 계산하는 절차를 제안한다.
하위군 분석의 최소 샘플 크기를 결정하는 절차를 개략적으로 제시한다.
다음과 같은 기법의 참조 표를 생성하기 위해 시뮬레이션을 사용한다: k-means, Ward 응집적 계층화 클러스터링, c-means 퍼지 클러스터링, 잠재 클래스 분석, 잠재 프로파일 분석, 그리고 가우시안 혼합 모델링.

실험 결과

연구 질문

RQ1도메인 특화 효과 크기를 하위 그룹 식별 분석에 필요한 특성 수를 알려주도록 어떻게 형식화할 수 있는가?
RQ2일반적인 클러스터링 및 혼합 모형 전반에 걸친 신뢰할 수 있는 하위 그룹 분석에 필요한 최소 샘플 크기는 얼마인가?
RQ3세 가지 이상 하위 그룹 식별 방법에 대해 허용 가능한 검정력을 달성하기 위해 각 하위 그룹당 필요한 최소 관찰 수와 특징 수는 얼마인가?

주요 결과

예상 효과 크기를 최소 측정 변수 수로 변환하는 형식적 절차.
하위 그룹 분석에서 최소 샘플 크기를 계산하는 프레임워크.
인기 있는 하위 그룹 분석(k-means, Ward, c-means, LCA, LPA, 가우시안 혼합 모델)에 대한 시뮬레이션 기반 참조 표.
표에는 기대되는 하위 그룹별 및 특징별 최소 관찰 수가 표시되어 합당한 통계적 검정력을 달성한다.
이 표는 연구 설계에 바로 사용할 수 있다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.