QUICK REVIEW

[논문 리뷰] To which reference class do you belong? Measuring racial fairness of reference classes with normative modeling

Saige Rutherford, Thomas Wolfers|arXiv (Cornell University)|2024. 07. 26.

School Choice and Performance인용 수 8

한 줄 요약

이 논문은 뇌 구조의 규범적 모델에서 인종 공정성을 조사하며, 사전 학습된 모델과 인종 미포함 모델에서 인종 편향을 보여주고 인종을 포함하면 편향이 감소할 수 있지만 편차 점수로부터 인종을 예측할 수 있음이 나타난다. 참조 클래스의 대표성의 중요성과 편차의 해석에 대한 신중한 해석의 필요성을 강조한다.

ABSTRACT

Reference classes in healthcare establish healthy norms, such as pediatric growth charts of height and weight, and are used to chart deviations from these norms which represent potential clinical risk. How the demographics of the reference class influence clinical interpretation of deviations is unknown. Using normative modeling, a method for building reference classes, we evaluate the fairness (racial bias) in reference models of structural brain images that are widely used in psychiatry and neurology. We test whether including race in the model creates fairer models. We predict self-reported race using the deviation scores from three different reference class normative models, to better understand bias in an integrated, multivariate sense. Across all of these tasks, we uncover racial disparities that are not easily addressed with existing data or commonly used modeling techniques. Our work suggests that deviations from the norm could be due to demographic mismatch with the reference class, and assigning clinical meaning to these deviations should be done with caution. Our approach also suggests that acquiring more representative samples is an urgent research priority.

연구 동기 및 목표

Quantify racial bias in existing pre-trained normative models of cortical thickness.
Assess the impact of including race as a predictor in normative models.
Compare three normative-model configurations across two large datasets (HCP, UKB).
Determine whether deviation scores reveal race-specific biases and whether race can be predicted from deviations.

제안 방법

Fit three Bayesian normative models (pre-trained, race not included, race included) using cortical thickness from Freesurfer Destrieux atlas regions.
Use B-spline basis expansion for age effects and likelihood warping to map non-Gaussian responses to Gaussian latent space.
Compute deviation scores Z_nd and residual errors E_nd for each region and subject.
Qualitatively summarize average deviations and extreme deviations by race.
Quantitatively test group differences in deviations and residual errors with FDR-corrected t-tests.
Predict self-reported race from deviation scores using penalized logistic regression with 80/20 train/test split and 5-fold cross-validation.]
research_questions:[
Does racial bias exist in pre-trained normative models when race data are unknown?
Does including self-reported race in the normative model reduce racial bias in deviation scores and residual errors?
Can deviation scores from normative models predict self-reported race in a multivariate setting?
How does race-based modeling affect fairness across brain regions and datasets (HCP, UKB)?

Figure 1: Overview of analysis workflow. A) Normative models of brain structure were used to generate deviation scores. Three normative models were fit (pre-trained, race not included, and race included) representing two different reference classes and two sets of covariates. B) Normative models wer

실험 결과

연구 질문

RQ1사전에 학습된 규범 모델에서 인종 데이터가 알려지지 않았을 때 인종 편향이 존재하는가?
RQ2규범 모델에 자가 보고 인종을 포함시키면 편차 점수와 잔차에서의 인종 편향이 감소하는가?
RQ3규범 모델의 편차 점수로 자가 보고 인종을 다변량 설정에서 예측할 수 있는가?
RQ4뇌 영역 및 데이터 세트(HCP, UKB) 전반에서 인종 기반 모델링이 공정성에 어떻게 영향을 미치는가?

주요 결과

Dataset	Group	Metric	Pre-train	Race not included	Race included
HCP	W vs. A	deviations	40%	49%	9%
HCP	W vs. B	deviations	55%	51%	5%
UKB	W vs. A	deviations	17%	25%	19%
UKB	W vs. B	deviations	45%	37%	7%
HCP	W vs. A	error	74%	64%	55%
HCP	W vs. B	error	56%	28%	54%
UKB	W vs. A	error	71%	56%	53%
UKB	W vs. B	error	87%	73%	51%

규범 모델에서 인종 편향이 존재하며, 백인(W) 개인은 종종 0에 근접하고 아시아/흑인 그룹은 모델 유형에 따라 피질 두께의 과대추정 또는 과소추정을 보이는 경향이 있다.
규범 모델에 인종을 포함시키면 평균 편차의 그룹 간 차이가 크게 감소하지만 특정 그룹에서 지역별 과소추정이 남아 있다.
정량적 테스트에서 잔차 및 편차 점수의 그룹 간 차이가 유의하게 나타나며, 인종 포함 모델이 일반적으로 차이가 더 적은 것으로 보인다.
UKB에서 백인 그룹은 샘플 크기 불균형으로 잔차 오차가 더 큰 경우가 많고; HCP에서는 일부 그룹의 잔차 오차에서 측면화 효과가 나타난다.
모델 간에 편차 및 잔차 오차가 인종에 따라 달라지며, 참조 클래스와의 인구통계학적 불일치가 편차의 임상 해석을 주도할 수 있음을 시사한다.
인종은 편차 점수로부터 상당한 정확도로 식별될 수 있어, 인종 정보가 규범 모델 출력에 누출되고 있음을 보여준다.

Figure 2: Summary of normative model deviation scores across all three reference classes (pre-trained, race not included, and race included) in HCP and UKB datasets. A) Average (mean) deviations for all brain regions within all racial groups (columns). B) Percentage of extreme deviations (positive a

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.