QUICK REVIEW

[논문 리뷰] Diversify Your Vision Datasets with Automatic Diffusion-Based Augmentation

Lisa Dunlap, Alyssa Umino|arXiv (Cornell University)|2023. 05. 25.

Multimodal Machine Learning Applications인용 수 14

한 줄 요약

ALIA는 캡션과 대형 언어 모델을 활용하여 텍스트 가이드 확산으로 도메인 설명 및 훈련 이미지 편집을 생성하고, 생성기 파인튜닝 없이도 미세한 분류 및 도메인 일반화를 개선한다.

ABSTRACT

Many fine-grained classification tasks, like rare animal identification, have limited training data and consequently classifiers trained on these datasets often fail to generalize to variations in the domain like changes in weather or location. As such, we explore how natural language descriptions of the domains seen in training data can be used with large vision models trained on diverse pretraining datasets to generate useful variations of the training data. We introduce ALIA (Automated Language-guided Image Augmentation), a method which utilizes large vision and language models to automatically generate natural language descriptions of a dataset's domains and augment the training data via language-guided image editing. To maintain data integrity, a model trained on the original dataset filters out minimal image edits and those which corrupt class-relevant information. The resulting dataset is visually consistent with the original training data and offers significantly enhanced diversity. We show that ALIA is able to surpasses traditional data augmentation and text-to-image generated data on fine-grained classification tasks, including cases of domain generalization and contextual bias. Code is available at https://github.com/lisadunlap/ALIA.

연구 동기 및 목표

제한된 데이터로 미세한 시각 작업의 일반화 향상을 촉진한다.
데이터셋 특유의 도메인 설명에 근거한 데이터 증강 방법을 제안한다.
캡션 생성 및 대형 언어 모델을 활용하여 확산을 통해 이미지 편집을 안내한다.
작업 관련 정보를 보존하고 데이터 무결성을 유지하도록 편집을 필터링한다.

제안 방법

모든 훈련 이미지에 대해 사전 학습된 캡션 생성 모델로 이미지 캡션을 생성한다.
대형 언어 모델로 캡션을 요약하여 간결한 도메인 설명 집합(<10)을 생성한다.
도메인 설명에 따라 텍스트 조건 확산 방법(Img2Img 및 Instruct Pix2Pix)으로 훈련 이미지를 편집한다.
의미 기반(CLIP 기반) 및 신뢰도 기반 필터를 적용하여 실패한 편집을 제거한다.
확장된 데이터셋에서 ResNet50을 미세조정하고 작업 간 기초 모델과 비교한다.

실험 결과

연구 질문

RQ1ALIA가 학습 데이터에 기초한 유용하고 라벨 보존적인 이미지 편집을 가능하게 하는 도메인 설명을 생성할 수 있는가?
RQ2언어 가이드 편집이 도메인 일반화 및 편향 완화를 위한 전통적 증강 및 텍스트-투-이미지 생성보다 우수한가?
RQ3필터링 및 편집 기법의 선택이 증강 품질과 모델 성능에 어떤 영향을 미치는가?
RQ4다양한 도메인에 걸친 정확도에 데이터 양 증강의 효과는 어느 정도인가?

주요 결과

데이터셋	사용자 프롬프트	ALIA 프롬프트	ALIA 프롬프트 + 필터링
iWildCam	a camera trap photo of a { } …	79.92 ± 4.22%	84.87 ± 1.92%
CUB	a photo of a { } bird…	71.02 ± 0.47%	72.70 ± 0.10%
Waterbirds	an iNaturalist photo of a { } in nature.	63.64 ± 1.43%	71.40 ± 1.85%

ALIA는 전통적 증강 및 텍스트-투-이미지 데이터보다 우수하며 때로는 실제 데이터의 이점에 필적하거나 이를 상회한다.
iWildCam에서 ALIA는 원본 데이터 대비 최대 17%의 정확도 향상을 보이고, 동일 양의 실제 데이터를 추가하는 경우를 능가할 수 있다.
CUB에서는 도메인 기반 프롬프트를 사용할 때 RandAugment와 실제 데이터를 제외하면 기준값을 넘어 향상된다.
Waterbirds에서 필터링 기법을 적용한 ALIA는 동일 도메인 내 정확도에 거의 일치하고 도메인 외 일반화 강인성을 향상시킨다.
의미 기반 및 신뢰도 기반 필터링은 편집 실패를 줄이고 최종 정확도를 향상시킨다.
ALIA의 프롬프트 품질이 사용자 제공 프롬프트를 능가하며, 특히 맥락 편향 시나리오에서 그렇다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.