QUICK REVIEW

[論文レビュー] Diversify Your Vision Datasets with Automatic Diffusion-Based Augmentation

Lisa Dunlap, Alyssa Umino|arXiv (Cornell University)|May 25, 2023

Multimodal Machine Learning Applications被引用数 14

ひとこと要約

ALIA はキャプションと大規模言語モデルを用いて、テキスト誘導拡散を通じてトレーニング画像のドメイン説明と編集を生成し、生成器を微調整せずに細分類とドメイン一般化を向上させる。

ABSTRACT

Many fine-grained classification tasks, like rare animal identification, have limited training data and consequently classifiers trained on these datasets often fail to generalize to variations in the domain like changes in weather or location. As such, we explore how natural language descriptions of the domains seen in training data can be used with large vision models trained on diverse pretraining datasets to generate useful variations of the training data. We introduce ALIA (Automated Language-guided Image Augmentation), a method which utilizes large vision and language models to automatically generate natural language descriptions of a dataset's domains and augment the training data via language-guided image editing. To maintain data integrity, a model trained on the original dataset filters out minimal image edits and those which corrupt class-relevant information. The resulting dataset is visually consistent with the original training data and offers significantly enhanced diversity. We show that ALIA is able to surpasses traditional data augmentation and text-to-image generated data on fine-grained classification tasks, including cases of domain generalization and contextual bias. Code is available at https://github.com/lisadunlap/ALIA.

研究の動機と目的

限られたデータでの細粒度ビジョンタスクに対する一般化の改善を促進する。
データセット固有のドメイン説明に基づくデータ拡張手法を提案する。
キャプショニングと大規模言語モデルを活用して、拡散を介して画像編集を導く。
タスクに関連する情報とデータの整合性を保つよう編集をフィルタリングする。

提案手法

全てのトレーニング画像に対して事前学習済みキャプショニングモデルで画像キャプションを生成する。
大規模言語モデルを用いてキャプションを要約し、(<10) の簡潔なドメイン説明を作成する。
ドメイン説明に導かれたテキスト条件付き拡散法（Img2Img と Instruct Pix2Pix）を用いてトレーニング画像を編集する。
意味論的（CLIP ベース）および信頼度ベースのフィルターを適用して失敗した編集を除去する。
拡張データセットで ResNet50 を微調整し、タスク間でベースラインと比較する。

実験結果

リサーチクエスチョン

RQ1ALIA はトレーニングデータに根ざし、ラベルを保持した有用な画像編集を可能にするドメイン説明を生み出せるか？
RQ2言語誘導編集は、従来のデータ拡張やテキストから画像生成と比べて、ドメイン一般化とバイアス緩和において優れているか？
RQ3フィルタリングと編集手法の選択は、拡張の品質とモデル性能にどう影響するか？
RQ4異なるドメイン間でデータ量を増やすと精度にどのような影響があるか？

主な発見

Dataset	User Prompt	ALIA Prompts	ALIA Prompts + Filtering
iWildCam	a camera trap photo of a { } …	79.92 ± 4.22%	84.87 ± 1.92%
CUB	a photo of a { } bird…	71.02 ± 0.47%	72.70 ± 0.10%
Waterbirds	an iNaturalist photo of a { } in nature.	63.64 ± 1.43%	71.40 ± 1.85%

ALIA は従来の拡張やテキストから画像データを上回り、時には実データの恩恵に匹敵するかそれを超える。
iWildCam では、ALIA は元データより最大 17% の精度向上をもたらし、同量の実データを追加するよりも上回ることがある。
CUB では、ドメイン根拠のプロンプトを用いる場合、RandAugment および実データを除くベースラインを超える改善を示す。
Waterbirds では、フィルタリング手法を取り入れた ALIA は同一ドメイン内の精度にほぼ一致し、ドメイン外の堅牢性を改善する。
意味論的および信頼度ベースのフィルタリングは編集の失敗を減らし、最終的な精度を高める。
ALIA のプロンプト品質はユーザー提供のプロンプトを上回り、特に文脈的バイアスのシナリオで顕著。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。