QUICK REVIEW

[论文解读] Diversify Your Vision Datasets with Automatic Diffusion-Based Augmentation

Lisa Dunlap, Alyssa Umino|arXiv (Cornell University)|May 25, 2023

Multimodal Machine Learning Applications被引用 14

一句话总结

ALIA 使用标题和大型语言模型通过文本引导扩散来生成域描述和训练图像的编辑，从而在不微调生成器的情况下提升细粒度分类和领域泛化。

ABSTRACT

Many fine-grained classification tasks, like rare animal identification, have limited training data and consequently classifiers trained on these datasets often fail to generalize to variations in the domain like changes in weather or location. As such, we explore how natural language descriptions of the domains seen in training data can be used with large vision models trained on diverse pretraining datasets to generate useful variations of the training data. We introduce ALIA (Automated Language-guided Image Augmentation), a method which utilizes large vision and language models to automatically generate natural language descriptions of a dataset's domains and augment the training data via language-guided image editing. To maintain data integrity, a model trained on the original dataset filters out minimal image edits and those which corrupt class-relevant information. The resulting dataset is visually consistent with the original training data and offers significantly enhanced diversity. We show that ALIA is able to surpasses traditional data augmentation and text-to-image generated data on fine-grained classification tasks, including cases of domain generalization and contextual bias. Code is available at https://github.com/lisadunlap/ALIA.

研究动机与目标

在数据有限的情况下，推动细粒度视觉任务的泛化能力提升。
提出一种基于数据集特定领域描述的数据增强方法。
利用标题生成和大型语言模型通过扩散引导图像编辑。
过滤编辑以保留与任务相关的信息和数据完整性。

提出的方法

对所有训练图像使用预训练的 captioning 模型生成图像标题。
使用大型语言模型对标题进行摘要，生成一个简短的域描述集合（<10）。
使用以文本为条件的扩散方法（Img2Img 和 Instruct Pix2Pix）在域描述的引导下编辑训练图像。
应用语义（基于 CLIP）和置信度过滤，去除编辑失败的结果。
在扩增数据集上微调 ResNet50，并在各任务中与基线方法进行比较。

实验结果

研究问题

RQ1ALIA 是否能够生成基于训练数据、可用且保持标签的图像编辑的域描述？
RQ2语言引导的编辑是否在领域泛化和偏见缓解方面超越传统数据增强和文本到图像生成？
RQ3过滤与编辑技术选择如何影响增强质量和模型性能？
RQ4增加数据量对不同领域的准确率有何影响？

主要发现

数据集	用户提示	ALIA 提示	ALIA 提示 + 过滤
iWildCam	一张 { } 的野外相机陷阱照片 …	79.92 ± 4.22%	84.87 ± 1.92%
CUB	一张 { } 鸟的照片…	71.02 ± 0.45%	72.70 ± 0.10%
Waterbirds	在自然环境中的 { } 的 iNaturalist 照片。	63.64 ± 1.43%	71.40 ± 1.85%

ALIA 在效果上优于传统数据增强和文本到图像数据，有时甚至可与真实数据的收益相媲美或超越。
在 iWildCam 上，ALIA 相较原始数据可获得高达 17% 的准确率提升，且在相同数量时甚至超过添加真实数据的效果。
在 CUB 数据集上，ALIA 的表现超过基线，除了在使用基于领域提示时 RandAugment 和真实数据的情况。
在 Waterbirds 上，采用过滤方式的 ALIA 在域内准确率接近，且提升了域外鲁棒性。
语义与置信度过滤减少编辑失败并提升最终准确率。
ALIA 的提示质量优于用户提供的提示，尤其是在上下文偏见情景中。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。