[論文レビュー] CutPaste: Self-Supervised Learning for Anomaly Detection and Localization
A self-supervised CutPaste-based approach learns representations from normal data to detect and localize unknown image defects, achieving state-of-the-art results on MVTec AD without anomalous training data. It also enables patch-level localization.
We aim at constructing a high performance model for defect detection that detects unknown anomalous patterns of an image without anomalous data. To this end, we propose a two-stage framework for building anomaly detectors using normal training data only. We first learn self-supervised deep representations and then build a generative one-class classifier on learned representations. We learn representations by classifying normal data from the CutPaste, a simple data augmentation strategy that cuts an image patch and pastes at a random location of a large image. Our empirical study on MVTec anomaly detection dataset demonstrates the proposed algorithm is general to be able to detect various types of real-world defects. We bring the improvement upon previous arts by 3.1 AUCs when learning representations from scratch. By transfer learning on pretrained representations on ImageNet, we achieve a new state-of-theart 96.6 AUC. Lastly, we extend the framework to learn and extract representations from patches to allow localizing defective areas without annotations during training.
研究の動機と目的
- Develop a defect-detection method that learns from normal data only.
- Introduce a self-supervised proxy task (CutPaste) to learn representations sensitive to local irregularities.
- Show that learned representations enable effective one-class anomaly detection.
- Extend to patch-based representations for localization without anomaly training data.
- Evaluate robustness across diverse defect types and compare to prior methods.
提案手法
- Train a CNN-based encoder with a binary/3-way classifier to distinguish normal versus CutPaste-augmented normal images.
- Propose CutPaste augmentation by cutting a patch, possibly rotating/jittering it, and pasting it at a random location.
- Optionally use CutPaste-Scar as a long-thin patch variant and train a 3-way classifier (Normal, CutPaste, CutPaste-Scar).
- Build a generative one-class detector on learned representations using Gaussian density estimation (GDE) over top features.
- Optionally extend to patch-level representations by applying CutPaste to cropped patches and producing a pixel-wise anomaly heatmap via dense patch scoring and receptive-field upsampling.
- Evaluate image-level anomaly detection via AUC on MVTec AD, and pixel-level localization via GradCAM and patch-based scores.

実験結果
リサーチクエスチョン
- RQ1Can CutPaste-based self-supervised learning learn representations that generalize to unseen real defects without anomalous training data?
- RQ2How do CutPaste and its variants compare to other augmentations and self-supervised tasks for defect detection?
- RQ3Can patch-level representations enable accurate localization of defects without anomaly annotations during training?
- RQ4Does transfer learning from ImageNet-pretrained features further improve defect detection performance?
- RQ5How robust is the approach across texture and object categories on MVTec AD?
主な発見
- From scratch, CutPaste achieves 95.2 AUC for image-level detection on MVTec AD, outperforming prior work by at least 3.1 AUC.
- With ImageNet-pretrained backbones, CutPaste yields 96.6 AUC, setting a new state-of-the-art.
- Patch-based representations reach 96.0 pixel-level localization AUC, surpassing prior methods.
- Ensembling 5 CutPaste (3-way) models improves image-level AUC to 96.1.
- CutPaste variants (CutPaste and CutPaste-Scar) outperform rotation, Cutout, and scar baselines for defect detection.
- Transfer learning with CutPaste can further improve pretrained EfficientNet features, achieving 96.6 AUC without and with fine-tuning.
![Figure 2 : Visualization of (a, green) normal, (b, red) anomaly, and (c–h, blue) augmented normal samples from bottle, toothbrush, screw, grid, and wood classes of MVTec anomaly detection dataset [ 5 ] . Augmented normal samples are generated by baseline augmentations including (c) Cutout and (d) Sc](https://ar5iv.labs.arxiv.org/html/2104.04015/assets/x2.png)
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。