QUICK REVIEW

[論文レビュー] AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection

Qihang Zhou, Guansong Pang|arXiv (Cornell University)|Oct 29, 2023

Anomaly Detection Techniques and Applications被引用数 31

ひとこと要約

AnomalyCLIP は CLIP を、汎用の正常性と異常性を捉えるオブジェクト非依存のテキストプロンプトを学習することで適応させ、さまざまな産業および医療領域におけるゼロショット異常検知とセグメンテーションを改善します。

ABSTRACT

Zero-shot anomaly detection (ZSAD) requires detection models trained using auxiliary data to detect anomalies without any training sample in a target dataset. It is a crucial task when training data is not accessible due to various concerns, eg, data privacy, yet it is challenging since the models need to generalize to anomalies across different domains where the appearance of foreground objects, abnormal regions, and background features, such as defects/tumors on different products/organs, can vary significantly. Recently large pre-trained vision-language models (VLMs), such as CLIP, have demonstrated strong zero-shot recognition ability in various vision tasks, including anomaly detection. However, their ZSAD performance is weak since the VLMs focus more on modeling the class semantics of the foreground objects rather than the abnormality/normality in the images. In this paper we introduce a novel approach, namely AnomalyCLIP, to adapt CLIP for accurate ZSAD across different domains. The key insight of AnomalyCLIP is to learn object-agnostic text prompts that capture generic normality and abnormality in an image regardless of its foreground objects. This allows our model to focus on the abnormal image regions rather than the object semantics, enabling generalized normality and abnormality recognition on diverse types of objects. Large-scale experiments on 17 real-world anomaly detection datasets show that AnomalyCLIP achieves superior zero-shot performance of detecting and segmenting anomalies in datasets of highly diverse class semantics from various defect inspection and medical imaging domains. Code will be made available at https://github.com/zqhang/AnomalyCLIP.

研究の動機と目的

プライバシーや新規性のためにターゲットドメインの訓練データが利用できない場合に、ゼロショット異常検知（ZSAD）を動機づける。
オブジェクトセマンティクスではなく、異常性/正常性に焦点を当てたCLIPベースの手法を開発する。
さまざまなオブジェクトクラスとドメインを横断して転移する汎用的な正常性と異常性のプロンプトを学習する。
グローバルおよびローカルの文脈を組み合わせて、大まかで粗いものと微細な異常の手掛かりの両方を捉える。

提案手法

特定の物体名を汎用トークンに置換して、正常性と異常性のためのオブジェクト非依存テキストプロンプトテンプレートを導入する。
補助データからプロンプトを学習するために、グローバル（画像レベル）およびローカル（ピクセルレベル）の損失を同時に最小化するグローカル文脈最適化を用いる。
テキスト空間を洗練するために、学習可能なトークンをCLIPテキストエンコーダに挿入してテキストプロンプトのチューニングを適用する。
Diagonally Prominent Attention Maps (DPAM) を取り入れて、視覚エンコーダの代替的な自己注意スキームで局所的な視覚特徴を洗練させる。
グローバルおよびローカルの視覚的意味的整合を融合し、ゼロショット推論のための画像レベルの異常スコアとピクセルレベルの異常マップを生成する。

実験結果

リサーチクエスチョン

RQ1オブジェクト非依存プロンプトは、さまざまなオブジェクト意味論を横断して転移する汎用的な異常性/正常性パターンを学習できるか。
RQ2プロンプト学習中にグローバルとローカルの文脈を組み合わせると、ゼロショット異常検知とセグメンテーションが改善されるか。
RQ3プロンプトの改良（テキスト空間と視覚空間）は、クロスドメインZSADの性能にどのような影響を与えるか。
RQ4提案手法は、ターゲットドメインのファインチューニングなしで、産業および医療画像ドメイン全体に対してロバストか。

主な発見

AnomalyCLIP は、産業および医療ドメインの実世界データセット17件において卓越したゼロショット異常検知とセグメンテーションを達成する。
オブジェクト非依存プロンプトは、未知のオブジェクトや異常に対して、オブジェクト認識プロンプトよりも一般化性能が高い。
グローバルおよびローカル文脈の最適化と、テキスト空間の精練およびDPAMの併用により、継続的に性能が向上する。
産業データで学習したプロンプトを医療データセットに適用した場合に医療ドメインでの一般化が観察され、医療の補助データをプロンプト作成に使用するとさらなる改善が見られる。
いくつかのベースラインと比較して、AnomalyCLIP は強力なクロスドメインZSAD能力を示し、異常パターンへのオブジェクト非依存プロンプト学習の利点を強調する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。