QUICK REVIEW

[論文レビュー] A Novel Approach to Industrial Defect Generation through Blended Latent Diffusion Model with Online Adaptation

Hanxi Li, Zhengxun Zhang|arXiv (Cornell University)|Feb 29, 2024

Industrial Vision Systems and Defect Detection被引用数 5

ひとこと要約

本論文は AdaBLDM を提案する。拡散モデルベースの欠陥生成器で、欠陥トリマップ、クロスメディアルプロンプト、オンラインデコーダ適応を用いて、異なる高品質な産業欠陥を合成し、異常検知を向上させる。

ABSTRACT

Effectively addressing the challenge of industrial Anomaly Detection (AD) necessitates an ample supply of defective samples, a constraint often hindered by their scarcity in industrial contexts. This paper introduces a novel algorithm designed to augment defective samples, thereby enhancing AD performance. The proposed method tailors the blended latent diffusion model for defect sample generation, employing a diffusion model to generate defective samples in the latent space. A feature editing process, controlled by a ``trimap" mask and text prompts, refines the generated samples. The image generation inference process is structured into three stages: a free diffusion stage, an editing diffusion stage, and an online decoder adaptation stage. This sophisticated inference strategy yields high-quality synthetic defective samples with diverse pattern variations, leading to significantly improved AD accuracies based on the augmented training set. Specifically, on the widely recognized MVTec AD dataset, the proposed method elevates the state-of-the-art (SOTA) performance of AD with augmented data by 1.5%, 1.9%, and 3.1% for AD metrics AP, IAP, and IAP90, respectively. The implementation code of this work can be found at the GitHub repository https://github.com/GrandpaXun242/AdaBLDM.git

研究の動機と目的

産業現場における欠陥サンプルの不足に対処し、効果的な異常検知（AD）を実現する。
欠陥合成に適した blended latent diffusion model (BLDM) に基づく欠陥生成フレームワークを提案する。
欠陥トリマップとクロスメディアルプロンプトを組み込み、欠陥の位置と外観を制御する。
サンプルのリアリズムと多様性を高めるために、3段階推論とオンラインデコーダ適応を導入する。
生成データを用いて複数のベンチマークでAD性能の改善を実証する。

提案手法

潜在拡散モデル (LDM) を骨格として用い、欠陥領域を制御するためにクロ-modal prompts と欠陥トリマップを拡張する。
生成に影響を与える対象物と欠陥タイプを説明するために CLIP/BLIP 由来の語彙プロンプトを用いる。
埋込みを介して欠陥をターゲット領域に限定し、トリマップ特徴のための特別なエンコーダを導入する。
内容編集を含む多段デノイズ（自由拡散段階、潜在空間編集、ピクセル空間ブレンディング）を実施して欠陥を作成する。
各生成サンプルのデコーダを微調整するオンラインデコーダ適応を提案し、欠陥の現実性と正常領域の保持のバランスをとる。
ドメインデータ上でデノイジングモデルを事前学習、次に実欠陥データでトリマップエンコーダと畳み込みブロックを微調整する2段階スキームでモデルを訓練する。

Figure 1: Illustration of three defect generation styles. From top to bottom: conventional approaches, GAN-based algorithms, and the proposed method.

実験結果

リサーチクエスチョン

RQ1拡散ベースの欠陥生成は、多様で高品質な合成欠陥を生み出し、既存SOTA手法を超えるAD性能を達成できるか。
RQ2クロスメディアルプロンプトと欠陥トリマップは、生成欠陥のリアリズムと局在性にどのような影響を与えるか。
RQ3オンラインデコーダ適応は、合成欠陥の品質と下流のAD指標をさらに向上させるか。
RQ4AdaBLDM はGANベースおよび他の拡散ベース欠陥生成法と比較して、複数の産業データセットでどのように性能を示すか。

主な発見

カテゴリ	Pixel-AUC (Genuine)	Pixel-AUC (DFM)	Pixel-AUC (DCDGANc-StarGAN)	Pixel-AUC (DCDGANc-StyleGAN)	Pixel-AUC (AdaBLDM)	PRO (Genuine)	PRO (DFM)	PRO (DCDGANc-StarGAN)	PRO (DCDGANc-StyleGAN)	PRO (AdaBLDM)	AP (Genuine)	AP (DFM)	AP (DCDGANc-StarGAN)	AP (DCDGANc-StyleGAN)	AP (AdaBLDM)	IAP (Genuine)	IAP (DFM)	IAP (DCDGANc-StarGAN)	IAP (DCDGANc-StyleGAN)
Hazelnut	98.23	97.65	82.41	96.79	98.23	71.08	66.05	68.5	71.08	47.94	38.7	19.19	41.27	47.94	66.65	57.6	25.69	52.3	66.65
Wood	95.96	39.91	96.79	89.83	95.96	71.02	67.63	68.5	71.02	48.83	?	?	?	48.83	63.41	?	?	?	63.41
Capsule	94.17	91.72	99.51	94.05	94.17	45.26	37.37	71.09	37.13	13.12	10.47	52.11	?	13.12	32.86	?	?	?	32.86
Leather	99.23	45.15	89.83	99.51	99.23	68.5	51.97	71.09	71.09	60.52	25.08	?	52.11	61.49	?	?	?	?	61.49
Grid	95.22	73.75	93.66	95.82	95.22	35.31	46.29	46.91	46.91	7.55	?	?	?	7.55	52.86	?	?	?	52.86
Tile	92.74	73.54	94.05	82.41	92.74	71.86	38.69	37.13	54.24	47.4	7.85	10.47	?	47.4	68.46	?	?	?	68.46
Carpet	95.96	39.91	95.54	89.83	95.96	71.02	44.8	44.8	51.97	48.83	13.12	25.08	40.75	48.83	63.41	?	?	?	63.41

AdaBLDM は augmentation data を用いた MVTec AD で新しい SOTA の AD性能を達成し、DeSTseg を AP で 1.5%、IAP で 1.9%、IAP90 で 3.1% 上回る。
SOTA の欠陥生成器（DFM, DCDGANc 系列）と比較して、AdaBLDM はより質の高く、より多様な欠陥サンプルを提供し、AD指標で高い評価を示す。
3段階推論とオンラインデコーダ適応は、サンプル忠実度と欠陥パターンの多様性を向上させる。
複数のサブカテゴリにわたり、AdaBLDM 生成データを用いた SVM ベースの AD は強力な異常検知率を示し、DeSTsegベースの AD も AdaBLDM データの恩恵を受ける。
本手法は MVTEC AD、BTAD、KSDD2 のテクスチャ・オブジェクトカテゴリで頑健性を示す。

Figure 2: The network structure of the proposed BLDM-based method for generating defective regions on a image. One can see besides the noise inupt, the model is governed by a a text prompt and a trimap that indicates the locations of the object and defect

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。