QUICK REVIEW

[논문 리뷰] Assessing Dataset Bias in Computer Vision

Athiya Deviyani|arXiv (Cornell University)|2021. 01. 01.

Domain Adaptation and Few-Shot Learning인용 수 6

한 줄 요약

이 연구는 컴퓨터 비전 데이터셋의 편향을 완화하기 위해 데이터 증강 기법을 조사하며, UTKFace 데이터셋의 성별, 연령, 민족에 대한 불균형 분포에 초점을 맞춘다. 언더샘플링, 기하학적 변환, VAE, GAN을 평가한 결과, StarGAN 기반 증강이 가장 뛰어난 성능을 보였으며(UTKFace 테스트 세트에서 91.75% 정확도), 클래스 간 정확도가 균일하고 외부 데이터셋에 대한 일반화 능력이 향상되었다.

ABSTRACT

A biased dataset is a dataset that generally has attributes with an uneven class distribution. These biases have the tendency to propagate to the models that train on them, often leading to a poor performance in the minority class. In this project, we will explore the extent to which various data augmentation methods alleviate intrinsic biases within the dataset. We will apply several augmentation techniques on a sample of the UTKFace dataset, such as undersampling, geometric transformations, variational autoencoders (VAEs), and generative adversarial networks (GANs). We then trained a classifier for each of the augmented datasets and evaluated their performance on the native test set and on external facial recognition datasets. We have also compared their performance to the state-of-the-art attribute classifier trained on the FairFace dataset. Through experimentation, we were able to find that training the model on StarGAN-generated images led to the best overall performance. We also found that training on geometrically transformed images lead to a similar performance with a much quicker training time. Additionally, the best performing models also exhibit a uniform performance across the classes within each attribute. This signifies that the model was also able to mitigate the biases present in the baseline model that was trained on the original training set. Finally, we were able to show that our model has a better overall performance and consistency on age and ethnicity classification on multiple datasets when compared with the FairFace model. Our final model has an accuracy on the UTKFace test set of 91.75%, 91.30%, and 87.20% for the gender, age, and ethnicity attribute respectively, with a standard deviation of less than 0.1 between the accuracies of the classes of each attribute.

연구 동기 및 목표

UTKFace와 같은 인기 있는 데이터셋에서 성분포가 불균형한 데 기인한 컴퓨터 비전의 데이터셋 편향 문제를 해결하기 위해.
데이터 증강 기법이 소수 집단의 성능을 향상시키고 모델 편향을 줄일 수 있는지 평가하기 위해.
언더샘플링, 기하학적 변환, VAE, GAN의 효과성을 비교하여 편향 완화에 기여하는지 평가하기 위해.
외부 얼굴 인식 데이터셋(LFWA+ 및 CelebA)에서의 모델 일반화 능력을 평가하기 위해.
최신 기술인 FairFace 속성 분류기와의 성능 기준을 설정하기 위해.

제안 방법

언더샘플링, 기하학적 변환, 변동형 오토에코더(VAEs), 생성 적대 기반 네트워크(GANs, 포함 StarGAN)를 포함한 네 가지 데이터 증강 기법을 적용하였다.
모든 분류기의 일관성 있는 평가를 위해 ResNet-18 아키텍처를 사용하였다.
증강된 UTKFace 샘플에서 모델을 학습하고 원본 UTKFace 테스트 세트 및 외부 데이터셋에서 평가하였다.
정확도, 클래스 정확도 간 표준편차, 교차 데이터셋 일반화 성능와 같은 표준 지표를 사용하였다.
상대적 성능 평가를 위해 FairFace 데이터셋에서 학습된 최신 기술 모델과 결과를 비교하였다.
구현에는 PyTorch를 사용하였으며, 클래스별 정확도 일관성으로 모델의 강인성을 평가하였다.

실험 결과

연구 질문

RQ1RQ1: 다양한 데이터 증강 기법이 원본 UTKFace 테스트 세트에서 모델 성능에 어떤 영향을 미치는가?
RQ2RQ2: 증강된 데이터에서 학습한 모델은 LFWA+ 및 CelebA와 같은 외부 얼굴 인식 데이터셋으로 일반화되는 정도는 어떠한가?
RQ3RQ3: 최고 성능을 보인 모델은 정확도 및 편향 완화 측면에서 최신 기술인 FairFace 모델과 비교해 어떻게 다른가?

주요 결과

StarGAN으로 생성된 이미지에서 학습한 모델은 성별 분류에서 UTKFace 테스트 세트에서 전체 정확도가 91.75%로 가장 높았으며, 연령 분류는 91.30%, 민족 분류는 87.20%였다.
StarGAN로 생성된 데이터에서 학습한 모델은 각 속성에 대해 클래스 정확도 간 표준편차가 0.1 이하로 유지되어 균일한 성능과 효과적인 편향 완화를 보였다.
기하학적 변환은 StarGAN와 유사한 성능를 보였지만 훨씬 빠른 학습 시간을 기록하여 실용적인 대안이 되었다.
최고 성능을 보인 모델은 FairFace 모델보다 LFWA+ 및 CelebA에서 뛰어난 교차 데이터셋 일반화 능력을 보였다.
연구 결과, 특히 GAN 기반 증강이 언더샘플링이나 VAE와 같은 전통적 방법보다 편향 감소 및 공정성 향상에 더 효과적임을 확인하였다.
결과적으로 데이터 증강이 소수 집단의 얼굴 속성 분류에서 성능 격차를 효과적으로 줄일 수 있음을 보여주었다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.