Skip to main content
QUICK REVIEW

[論文レビュー] CutPaste: Self-Supervised Learning for Anomaly Detection and Localization

Chunliang Li, Kihyuk Sohn|arXiv (Cornell University)|Apr 8, 2021
Anomaly Detection Techniques and Applications被引用数 86
ひとこと要約

A self-supervised CutPaste-based approach learns representations from normal data to detect and localize unknown image defects, achieving state-of-the-art results on MVTec AD without anomalous training data. It also enables patch-level localization.

ABSTRACT

We aim at constructing a high performance model for defect detection that detects unknown anomalous patterns of an image without anomalous data. To this end, we propose a two-stage framework for building anomaly detectors using normal training data only. We first learn self-supervised deep representations and then build a generative one-class classifier on learned representations. We learn representations by classifying normal data from the CutPaste, a simple data augmentation strategy that cuts an image patch and pastes at a random location of a large image. Our empirical study on MVTec anomaly detection dataset demonstrates the proposed algorithm is general to be able to detect various types of real-world defects. We bring the improvement upon previous arts by 3.1 AUCs when learning representations from scratch. By transfer learning on pretrained representations on ImageNet, we achieve a new state-of-theart 96.6 AUC. Lastly, we extend the framework to learn and extract representations from patches to allow localizing defective areas without annotations during training.

研究の動機と目的

  • Develop a defect-detection method that learns from normal data only.
  • Introduce a self-supervised proxy task (CutPaste) to learn representations sensitive to local irregularities.
  • Show that learned representations enable effective one-class anomaly detection.
  • Extend to patch-based representations for localization without anomaly training data.
  • Evaluate robustness across diverse defect types and compare to prior methods.

提案手法

  • Train a CNN-based encoder with a binary/3-way classifier to distinguish normal versus CutPaste-augmented normal images.
  • Propose CutPaste augmentation by cutting a patch, possibly rotating/jittering it, and pasting it at a random location.
  • Optionally use CutPaste-Scar as a long-thin patch variant and train a 3-way classifier (Normal, CutPaste, CutPaste-Scar).
  • Build a generative one-class detector on learned representations using Gaussian density estimation (GDE) over top features.
  • Optionally extend to patch-level representations by applying CutPaste to cropped patches and producing a pixel-wise anomaly heatmap via dense patch scoring and receptive-field upsampling.
  • Evaluate image-level anomaly detection via AUC on MVTec AD, and pixel-level localization via GradCAM and patch-based scores.
Figure 1 : An overview of our method for anomaly detection and localization. (a) A deep network (CNN) is trained to distinguish images from normal (blue) and augmented (green) data distributions by CutPaste (orange dotted box), which cuts a small rectangular region (yellow dotted box) from normal da
Figure 1 : An overview of our method for anomaly detection and localization. (a) A deep network (CNN) is trained to distinguish images from normal (blue) and augmented (green) data distributions by CutPaste (orange dotted box), which cuts a small rectangular region (yellow dotted box) from normal da

実験結果

リサーチクエスチョン

  • RQ1Can CutPaste-based self-supervised learning learn representations that generalize to unseen real defects without anomalous training data?
  • RQ2How do CutPaste and its variants compare to other augmentations and self-supervised tasks for defect detection?
  • RQ3Can patch-level representations enable accurate localization of defects without anomaly annotations during training?
  • RQ4Does transfer learning from ImageNet-pretrained features further improve defect detection performance?
  • RQ5How robust is the approach across texture and object categories on MVTec AD?

主な発見

カテゴリDOCCU-StudentP-SVDDRotationCutoutScarCutPasteCutPaste (3-way)Ensemble
テクスチャ90.695.392.929.7 ± 1.435.3 ± 2.392.7 ± 0.467.9 ± 1.894.6 ± 0.693.1 ± 1.193.9
グリッド52.498.794.660.5 ± 7.057.5 ± 3.074.4 ± 2.599.9 ± 0.195.5 ± 0.399.9 ± 0.1100.0
78.393.490.955.2 ± 1.467.7 ± 1.599.9 ± 0.199.7 ± 0.1100.0 ± 0.0100.0 ± 0.0100.0 ± 0.0
タイル96.595.897.870.1 ± 1.971.8 ± 4.096.7 ± 0.995.9 ± 1.089.4 ± 2.893.4 ± 1.094.6
木材91.695.596.595.8 ± 1.192.0 ± 0.898.9 ± 0.294.9 ± 0.598.7 ± 0.398.6 ± 0.599.1
平均81.995.794.562.3 ± 2.664.9 ± 2.392.5 ± 0.891.7 ± 0.795.7 ± 0.897.0 ± 0.597.5
物体bottle99.696.798.695.0 ± 0.788.7 ± 0.898.5 ± 0.299.2 ± 0.298.0 ± 0.598.3 ± 0.5
ケーブル90.982.390.385.3 ± 0.880.2 ± 1.478.3 ± 1.787.1 ± 0.878.8 ± 2.980.6 ± 0.581.2
カプセル91.092.876.771.8 ± 1.469.5 ± 1.182.9 ± 0.787.9 ± 0.795.3 ± 0.896.2 ± 0.598.2
ヘーゼルナッツ95.091.492.083.6 ± 0.869.7 ± 1.398.9 ± 0.291.3 ± 0.696.7 ± 0.497.3 ± 0.398.3
金属ナット85.294.094.072.7 ± 0.584.6 ± 0.786.9 ± 1.596.8 ± 0.597.9 ± 0.299.3 ± 0.299.9
ピル80.486.786.179.2 ± 1.478.7 ± 0.782.2 ± 1.493.4 ± 0.985.8 ± 1.392.4 ± 1.394.9
ねじ86.987.481.335.8 ± 2.917.6 ± 4.411.3 ± 2.254.4 ± 1.783.7 ± 0.786.3 ± 1.088.7
歯ブラシ96.498.6100.099.1 ± 0.298.1 ± 0.694.8 ± 1.099.2 ± 0.296.7 ± 0.498.3 ± 0.999.4
トランジスタ90.883.691.588.9 ± 0.482.5 ± 1.292.0 ± 0.796.4 ± 0.791.1 ± 0.695.5 ± 0.596.1
ジッパー92.495.897.974.3 ± 1.675.7 ± 1.086.8 ± 0.999.4 ± 0.199.5 ± 0.199.4 ± 0.299.9
  • From scratch, CutPaste achieves 95.2 AUC for image-level detection on MVTec AD, outperforming prior work by at least 3.1 AUC.
  • With ImageNet-pretrained backbones, CutPaste yields 96.6 AUC, setting a new state-of-the-art.
  • Patch-based representations reach 96.0 pixel-level localization AUC, surpassing prior methods.
  • Ensembling 5 CutPaste (3-way) models improves image-level AUC to 96.1.
  • CutPaste variants (CutPaste and CutPaste-Scar) outperform rotation, Cutout, and scar baselines for defect detection.
  • Transfer learning with CutPaste can further improve pretrained EfficientNet features, achieving 96.6 AUC without and with fine-tuning.
Figure 2 : Visualization of (a, green) normal, (b, red) anomaly, and (c–h, blue) augmented normal samples from bottle, toothbrush, screw, grid, and wood classes of MVTec anomaly detection dataset [ 5 ] . Augmented normal samples are generated by baseline augmentations including (c) Cutout and (d) Sc
Figure 2 : Visualization of (a, green) normal, (b, red) anomaly, and (c–h, blue) augmented normal samples from bottle, toothbrush, screw, grid, and wood classes of MVTec anomaly detection dataset [ 5 ] . Augmented normal samples are generated by baseline augmentations including (c) Cutout and (d) Sc

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。