QUICK REVIEW

[論文レビュー] Augmenting medical image classifiers with synthetic data from latent diffusion models

Luke W. Sagers, James A. Diao|arXiv (Cornell University)|Aug 23, 2023

AI in cancer detection被引用数 12

ひとこと要約

本論文は、 latent diffusion models からの合成データが、低データ設定で皮膚疾患分類器の性能を控えめに向上させることを示しており、実データが増えるにつれて利益は減少し、合成データと実データの比率が10:1を超えると利得が低下する。

ABSTRACT

While hundreds of artificial intelligence (AI) algorithms are now approved or cleared by the US Food and Drugs Administration (FDA), many studies have shown inconsistent generalization or latent bias, particularly for underrepresented populations. Some have proposed that generative AI could reduce the need for real data, but its utility in model development remains unclear. Skin disease serves as a useful case study in synthetic image generation due to the diversity of disease appearance, particularly across the protected attribute of skin tone. Here we show that latent diffusion models can scalably generate images of skin disease and that augmenting model training with these data improves performance in data-limited settings. These performance gains saturate at synthetic-to-real image ratios above 10:1 and are substantially smaller than the gains obtained from adding real images. As part of our analysis, we generate and analyze a new dataset of 458,920 synthetic images produced using several generation strategies. Our results suggest that synthetic data could serve as a force-multiplier for model development, but the collection of diverse real-world data remains the most important step to improve medical AI algorithms.

研究の動機と目的

皮膚科AIにおける表現不足とデータ不足に対処するため、合成データの評価を動機づける。
潜在拡散モデルからの合成データが、さまざまな実データ条件下で皮膚疾患分類に与える影響を定量化する。
生成戦略を特徴づけ、肌色の異なる被験者に対するモデルの公平性への影響を評価する。

提案手法

DreamBooth チューニングを用いた Stable Diffusion で、九つの皮膚疾患条件にまたがる458,920枚の合成画像を生成する。
さまざまな生成方法（inpainting、in-then-outpainting、text-to-image）で合成画像を補助した実データ上で分類器を訓練し、画像変換の有無を組み合わせる。
クラスごとに1,16,32,64,128,228といった異なる実データ量と、合成データ対実データ比を組み合わせて性能を評価する。
text-to-image によって生成された real image あたりの合成画像を0–75枚追加することで、用量反応を評価する。
Stanford DDI データセットを用いて Fitzpatrick 皮膚タイプ全体にわたる性能を調べ、悪性分類を検討する。
合成データ拡張を従来のデータ拡張と比較し、BH補正を用いて統計的有意性を分析する。

実験結果

リサーチクエスチョン

RQ1潜在拡散モデルから生成された合成画像は、実データが限られている場合に皮膚疾患分類器の性能を改善できるか？
RQ2異なる合成生成戦略（inpainting、outpainting、text-to-image）は、モデルの性能と肌の色調の公平性にどのように影響するか？
RQ3合成データと実データの比率と精度の向上との関係はどのようになり、利得は飽和するか？
RQ4悪性と良性の皮膚分類および異なる肌タイプにわたって、合成データの利得は持続するか？

主な発見

低データ設定での合成拡張は精度を向上させ、実データが1クラスあたり32件の場合、画像変換を用いると最大で13.2ポイントの利得が得られる。
合成データ対実データ比が10:1を超えると利得は飽和し、実データが増えると利得は小さくなる。
3つの生成方法はいくつかの条件で利点を生み出すが、text-to-image はアレルギー性接触皮膚炎のような疾患で最大の利得を示した。
Stanford DDI データセットの Fitzpatrick肌タイプ全体で悪性分類の合成データ利得が改善されたが、複数検定補正後に有意差が残る比較は限られていた。
大規模な合成データセット（458,920枚）が生成され、拡散モデルがスケーラブルなデータ源であることを示す一方、実データが性能の主な推進力である。
著者はさらなる研究を支援するため、458,920枚の合成画像を公開した。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。