QUICK REVIEW

[论文解读] Image retrieval outperforms diffusion models on data augmentation

Max F. Burg, Florian Wenzel|arXiv (Cornell University)|Apr 20, 2023

Domain Adaptation and Few-Shot Learning被引用 10

一句话总结

本研究比较基于扩散模型的数据增强与来自 DM 训练数据的简单最近邻检索基线，在数据稀缺条件下发现检索往往能带来更强的下游分类器性能在 ImageNet 上。个性化扩散模型有帮助但未超越检索。

ABSTRACT

Many approaches have been proposed to use diffusion models to augment training datasets for downstream tasks, such as classification. However, diffusion models are themselves trained on large datasets, often with noisy annotations, and it remains an open question to which extent these models contribute to downstream classification performance. In particular, it remains unclear if they generalize enough to improve over directly using the additional data of their pre-training process for augmentation. We systematically evaluate a range of existing methods to generate images from diffusion models and study new extensions to assess their benefit for data augmentation. Personalizing diffusion models towards the target data outperforms simpler prompting strategies. However, using the pre-training data of the diffusion model alone, via a simple nearest-neighbor retrieval procedure, leads to even stronger downstream performance. Our study explores the potential of diffusion models in generating new training data, and surprisingly finds that these sophisticated models are not yet able to beat a simple and strong image retrieval baseline on simple downstream vision tasks.

研究动机与目标

评估在数据稀缺条件下，基于扩散模型的数据增强方法对下游图像分类的有效性。
系统性基准测试一系列基于扩散模型的数据增强策略与一个检索基线。
评估将扩散模型个性化是否能超越基于提示的方法在增强质量上带来改进。

提出的方法

在 10% ImageNet 子集上基准测试基于扩散模型的数据增强方法（无条件生成、提示条件化，以及通过微调实现的个性化）。“
将这些方法与一个检索基线进行比较，该基线从扩散模型的预训练数据（Laion 5b）中，利用与类别提示相近的 CLIP 风格嵌入空间进行最近邻筛选。
使用在增强数据上训练的 ResNet-50 评估下游准确性。
将评估扩展至完整的 ImageNet 和 Caltech256，以测试结果的泛化性。
通过分析提示、条件化和个性化效应来控制多样性和域对齐。

实验结果

研究问题

RQ1扩散模型基于的数据增强方法是否优于来自 DM 预训练数据的简单最近邻检索基线？
RQ2提示基于的条件化或扩散模型的个性化是否能缩小与检索性能的差距？
RQ310% ImageNet 的发现是否可推广至完整的 ImageNet 以及其他数据集如 Caltech256？
RQ4在计算资源和数据质量之间，基于 DM 的增强与基于检索的增强有哪些权衡？

主要发现

基于扩散模型的增强在未增强的 10% ImageNet 基线上有提升，但被来自 DM 训练数据（Laion 5b）的最近邻检索所超越。
一个简单的检索基线在所评估的方法中获得最佳下游 Top-1 准确率（检索：10% ImageNet 的 62.6% ±0.1）。
基于提示的条件化（包括 CLIP 模板）在基本提示之上有改进，但未超越检索。
扩散模型的个性化（条件微调、聚类条件化、文本反演和 DM 微调）进一步提升了基于 DM 的增强，但仍未击败检索。
结果可推广至完整的 ImageNet 和 Caltech256，检索保持强劲性能并具备效率优势。
检索在计算上高效，不需要下载或在大数据集上训练；它依赖于检索索引和最近邻图像。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。