[論文レビュー] CutPaste: Self-Supervised Learning for Anomaly Detection and Localization
A self-supervised CutPaste-based approach learns representations from normal data to detect and localize unknown image defects, achieving state-of-the-art results on MVTec AD without anomalous training data. It also enables patch-level localization.
We aim at constructing a high performance model for defect detection that detects unknown anomalous patterns of an image without anomalous data. To this end, we propose a two-stage framework for building anomaly detectors using normal training data only. We first learn self-supervised deep representations and then build a generative one-class classifier on learned representations. We learn representations by classifying normal data from the CutPaste, a simple data augmentation strategy that cuts an image patch and pastes at a random location of a large image. Our empirical study on MVTec anomaly detection dataset demonstrates the proposed algorithm is general to be able to detect various types of real-world defects. We bring the improvement upon previous arts by 3.1 AUCs when learning representations from scratch. By transfer learning on pretrained representations on ImageNet, we achieve a new state-of-theart 96.6 AUC. Lastly, we extend the framework to learn and extract representations from patches to allow localizing defective areas without annotations during training.
研究の動機と目的
- Develop a defect-detection method that learns from normal data only.
- Introduce a self-supervised proxy task (CutPaste) to learn representations sensitive to local irregularities.
- Show that learned representations enable effective one-class anomaly detection.
- Extend to patch-based representations for localization without anomaly training data.
- Evaluate robustness across diverse defect types and compare to prior methods.
提案手法
- Train a CNN-based encoder with a binary/3-way classifier to distinguish normal versus CutPaste-augmented normal images.
- Propose CutPaste augmentation by cutting a patch, possibly rotating/jittering it, and pasting it at a random location.
- Optionally use CutPaste-Scar as a long-thin patch variant and train a 3-way classifier (Normal, CutPaste, CutPaste-Scar).
- Build a generative one-class detector on learned representations using Gaussian density estimation (GDE) over top features.
- Optionally extend to patch-level representations by applying CutPaste to cropped patches and producing a pixel-wise anomaly heatmap via dense patch scoring and receptive-field upsampling.
- Evaluate image-level anomaly detection via AUC on MVTec AD, and pixel-level localization via GradCAM and patch-based scores.

実験結果
リサーチクエスチョン
- RQ1Can CutPaste-based self-supervised learning learn representations that generalize to unseen real defects without anomalous training data?
- RQ2How do CutPaste and its variants compare to other augmentations and self-supervised tasks for defect detection?
- RQ3Can patch-level representations enable accurate localization of defects without anomaly annotations during training?
- RQ4Does transfer learning from ImageNet-pretrained features further improve defect detection performance?
- RQ5How robust is the approach across texture and object categories on MVTec AD?
主な発見
| カテゴリ | DOCC | U-Student | P-SVDD | Rotation | Cutout | Scar | CutPaste | CutPaste (3-way) | Ensemble | |
|---|---|---|---|---|---|---|---|---|---|---|
| テクスチャ | 90.6 | 95.3 | 92.9 | 29.7 ± 1.4 | 35.3 ± 2.3 | 92.7 ± 0.4 | 67.9 ± 1.8 | 94.6 ± 0.6 | 93.1 ± 1.1 | 93.9 |
| グリッド | 52.4 | 98.7 | 94.6 | 60.5 ± 7.0 | 57.5 ± 3.0 | 74.4 ± 2.5 | 99.9 ± 0.1 | 95.5 ± 0.3 | 99.9 ± 0.1 | 100.0 |
| 革 | 78.3 | 93.4 | 90.9 | 55.2 ± 1.4 | 67.7 ± 1.5 | 99.9 ± 0.1 | 99.7 ± 0.1 | 100.0 ± 0.0 | 100.0 ± 0.0 | 100.0 ± 0.0 |
| タイル | 96.5 | 95.8 | 97.8 | 70.1 ± 1.9 | 71.8 ± 4.0 | 96.7 ± 0.9 | 95.9 ± 1.0 | 89.4 ± 2.8 | 93.4 ± 1.0 | 94.6 |
| 木材 | 91.6 | 95.5 | 96.5 | 95.8 ± 1.1 | 92.0 ± 0.8 | 98.9 ± 0.2 | 94.9 ± 0.5 | 98.7 ± 0.3 | 98.6 ± 0.5 | 99.1 |
| 平均 | 81.9 | 95.7 | 94.5 | 62.3 ± 2.6 | 64.9 ± 2.3 | 92.5 ± 0.8 | 91.7 ± 0.7 | 95.7 ± 0.8 | 97.0 ± 0.5 | 97.5 |
| 物体 | bottle | 99.6 | 96.7 | 98.6 | 95.0 ± 0.7 | 88.7 ± 0.8 | 98.5 ± 0.2 | 99.2 ± 0.2 | 98.0 ± 0.5 | 98.3 ± 0.5 |
| ケーブル | 90.9 | 82.3 | 90.3 | 85.3 ± 0.8 | 80.2 ± 1.4 | 78.3 ± 1.7 | 87.1 ± 0.8 | 78.8 ± 2.9 | 80.6 ± 0.5 | 81.2 |
| カプセル | 91.0 | 92.8 | 76.7 | 71.8 ± 1.4 | 69.5 ± 1.1 | 82.9 ± 0.7 | 87.9 ± 0.7 | 95.3 ± 0.8 | 96.2 ± 0.5 | 98.2 |
| ヘーゼルナッツ | 95.0 | 91.4 | 92.0 | 83.6 ± 0.8 | 69.7 ± 1.3 | 98.9 ± 0.2 | 91.3 ± 0.6 | 96.7 ± 0.4 | 97.3 ± 0.3 | 98.3 |
| 金属ナット | 85.2 | 94.0 | 94.0 | 72.7 ± 0.5 | 84.6 ± 0.7 | 86.9 ± 1.5 | 96.8 ± 0.5 | 97.9 ± 0.2 | 99.3 ± 0.2 | 99.9 |
| ピル | 80.4 | 86.7 | 86.1 | 79.2 ± 1.4 | 78.7 ± 0.7 | 82.2 ± 1.4 | 93.4 ± 0.9 | 85.8 ± 1.3 | 92.4 ± 1.3 | 94.9 |
| ねじ | 86.9 | 87.4 | 81.3 | 35.8 ± 2.9 | 17.6 ± 4.4 | 11.3 ± 2.2 | 54.4 ± 1.7 | 83.7 ± 0.7 | 86.3 ± 1.0 | 88.7 |
| 歯ブラシ | 96.4 | 98.6 | 100.0 | 99.1 ± 0.2 | 98.1 ± 0.6 | 94.8 ± 1.0 | 99.2 ± 0.2 | 96.7 ± 0.4 | 98.3 ± 0.9 | 99.4 |
| トランジスタ | 90.8 | 83.6 | 91.5 | 88.9 ± 0.4 | 82.5 ± 1.2 | 92.0 ± 0.7 | 96.4 ± 0.7 | 91.1 ± 0.6 | 95.5 ± 0.5 | 96.1 |
| ジッパー | 92.4 | 95.8 | 97.9 | 74.3 ± 1.6 | 75.7 ± 1.0 | 86.8 ± 0.9 | 99.4 ± 0.1 | 99.5 ± 0.1 | 99.4 ± 0.2 | 99.9 |
- From scratch, CutPaste achieves 95.2 AUC for image-level detection on MVTec AD, outperforming prior work by at least 3.1 AUC.
- With ImageNet-pretrained backbones, CutPaste yields 96.6 AUC, setting a new state-of-the-art.
- Patch-based representations reach 96.0 pixel-level localization AUC, surpassing prior methods.
- Ensembling 5 CutPaste (3-way) models improves image-level AUC to 96.1.
- CutPaste variants (CutPaste and CutPaste-Scar) outperform rotation, Cutout, and scar baselines for defect detection.
- Transfer learning with CutPaste can further improve pretrained EfficientNet features, achieving 96.6 AUC without and with fine-tuning.
![Figure 2 : Visualization of (a, green) normal, (b, red) anomaly, and (c–h, blue) augmented normal samples from bottle, toothbrush, screw, grid, and wood classes of MVTec anomaly detection dataset [ 5 ] . Augmented normal samples are generated by baseline augmentations including (c) Cutout and (d) Sc](https://ar5iv.labs.arxiv.org/html/2104.04015/assets/x2.png)
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。