QUICK REVIEW

[論文レビュー] Effective Data Augmentation With Diffusion Models

Brandon Trabucco, Kyle G. Doherty|arXiv (Cornell University)|Feb 7, 2023

Domain Adaptation and Few-Shot Learning被引用数 81

ひとこと要約

tldr: 本論文は DA-Fusion を導入します。拡散モデルベースのデータ増強法で、実画像を意味的に編集し、少数ショット分類のための多様でタスク関連の合成データを作成します。事前学習モデルからの漏洩を回避します。

ABSTRACT

Data augmentation is one of the most prevalent tools in deep learning, underpinning many recent advances, including those from classification, generative models, and representation learning. The standard approach to data augmentation combines simple transformations like rotations and flips to generate new images from existing ones. However, these new images lack diversity along key semantic axes present in the data. Current augmentations cannot alter the high-level semantic attributes, such as animal species present in a scene, to enhance the diversity of data. We address the lack of diversity in data augmentation with image-to-image transformations parameterized by pre-trained text-to-image diffusion models. Our method edits images to change their semantics using an off-the-shelf diffusion model, and generalizes to novel visual concepts from a few labelled examples. We evaluate our approach on few-shot image classification tasks, and on a real-world weed recognition task, and observe an improvement in accuracy in tested domains.

研究の動機と目的

動機: 標準的な拡張は意味的多様性に欠け、高レベル属性の変更が困難です。
目的: どんな画像にも適用でき、少数ショット分類を改善する柔軟なオフ・ザ・シェルフの拡散ベース増強を開発する。
狙い: 実データと合成データのバランスを取り、最小限のドメイン特有の調整で未知の概念へ一般化する。

提案手法

テキストから画像への拡散モデルを用いて、モデルのテキストエンコーダに新しい埋め込みを挿入することで画像を意味的に編集する。
Textual Inversion を用いてラベル付きの少数例から新しい埋め込みを学習し、拡散モデルを未知概念へ適応させる。
SDEdit によって実画像を拡散過程に組み込み、学習済み埋め込みに guided された合成画像を生成する。
実データと合成データを訓練バッチ内で確率的に混合することで、実データと拡張データのバランスを取る。
画像スプリッシング時の挿入タイムステップ t0 を変化させることで多様性を増すランダム化増強強度を導入する。
漏洩防止戦略を実装する: モデル中心の手法として「クラス概念をモデルの重みから消去」し、データ中心の手法として「プロンプトからクラス名を省略」する。

実験結果

リサーチクエスチョン

RQ1拡散モデルの語彙外にある概念を意味的に編集する増強が、少数ショット分類の改善につながる dataset で有効か。
RQ2漏洩防止戦略は拡散ベースのデータ増強の有効性にどのような影響を与えるか。
RQ3増強強度のランダム性は性能向上に寄与するか、実データと合成データのバランスに対するロバスト性はどうか。
RQ4このアプローチは複数のドメイン（Pascal VOC、COCO、leafy spurge weed dataset）で有効か。

主な発見

DA-Fusion は三つのデータセットで少数ショット分類の精度を改善し、標準的な増強ベースラインより最大で約10ポイントの改善を示す。
モデル中心の漏洩防止は依然として有益で、Pascal および COCO の領域で最大約+5 ポイントの利得を示す。
データ中心の漏洩防止はより大きな利得を生み、最大約+10 ポイントに達することがあり、モデルの事前知識への依存や相互作用を示唆する。
増強強度をランダム化（t0 の変化）することは、固定強度より一貫して性能を向上させる。
DA-Fusion は実データと合成データのバランス（alpha と M）に対して頑健で、感度は控えめである。
寄稿された雑草データセット（leafy spurge）により、拡散モデルの語彙外の未知概念を扱えることが示される。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。