QUICK REVIEW

[论文解读] Hierarchical Vector Quantized Transformer for Multi-class Unsupervised Anomaly Detection

Ruiying Lu, Yujie Wu|arXiv (Cornell University)|Oct 22, 2023

Anomaly Detection Techniques and Applications被引用 22

一句话总结

本文提出 HVQ-Trans，一种统一的多类无监督异常检测框架，结合分层向量量化和原型导向的最优传输于 Transformer 中，以抑制相同捷径并在多类对象上提升检测与定位。

ABSTRACT

Unsupervised image Anomaly Detection (UAD) aims to learn robust and discriminative representations of normal samples. While separate solutions per class endow expensive computation and limited generalizability, this paper focuses on building a unified framework for multiple classes. Under such a challenging setting, popular reconstruction-based networks with continuous latent representation assumption always suffer from the "identical shortcut" issue, where both normal and abnormal samples can be well recovered and difficult to distinguish. To address this pivotal issue, we propose a hierarchical vector quantized prototype-oriented Transformer under a probabilistic framework. First, instead of learning the continuous representations, we preserve the typical normal patterns as discrete iconic prototypes, and confirm the importance of Vector Quantization in preventing the model from falling into the shortcut. The vector quantized iconic prototype is integrated into the Transformer for reconstruction, such that the abnormal data point is flipped to a normal data point.Second, we investigate an exquisite hierarchical framework to relieve the codebook collapse issue and replenish frail normal patterns. Third, a prototype-oriented optimal transport method is proposed to better regulate the prototypes and hierarchically evaluate the abnormal score. By evaluating on MVTec-AD and VisA datasets, our model surpasses the state-of-the-art alternatives and possesses good interpretability. The code is available at https://github.com/RuiyingLu/HVQ-Trans.

研究动机与目标

推动跨多类的统一异常检测，以减少逐类模型并提升泛化能力。
通过使用离散的具象原型而非连续潜在表示，解决基于重建的无监督异常检测中的相同捷径问题。
开发一个分层向量量化框架，以防止码本坍塌并在多个层次上保留正常模式。
引入原型导向的最优传输以校准原型并提升异常分数的可解释性。

提出的方法

在可学习的每类别码本中，用最近的具象原型替代连续潜在特征。
结合级联的基于向量量化的 Transformer 编码器/解码器，使用离散原型进行重建。
实现一个带有多个码本和专家的切换机制，以在无需微调的情况下处理多类数据。
使用分层的最优传输损失来将正常特征对齐到原型，并在各层校准异常分数。
通过综合目标函数进行优化，包括重建损失、原型相关与承诺损失、POT 损失以及用于专家切换的交叉熵损失。

Figure 1 : By replacing the continuous latent features with the normal iconic prototypes of corresponding category, the normal regions are reconstructed as normal patterns (shown in yellow boxes), while the anomalies are also reconstructed as normal (shown in red boxes).

实验结果

研究问题

RQ1一个统一的多类模型在无监督异常检测和定位方面能否超越逐类模型？
RQ2分层向量量化是否能缓解码本坍塌并相对于连续潜在空间减少相同捷径？
RQ3面向原型的最优传输在学习鲁棒原型和校准多层级异常分数方面有多有效？
RQ4切换机制是否在跨多样的对象类别中提升重建质量和检测准确性？

主要发现

HVQ-Trans 在 MVTec-AD 的 one-for-all 设置下达到最新的性能，超越了若干基线。
分层 VQ 层有助于防止码本坍塌，并通过在多个特征层级重建正常模式来提高定位。
面向原型的 OT 校准异常分数，在跨类别和复杂场景中实现更鲁棒的检测。
切换机制实现了按类别的原型与专家选择，提升了多类异常检测的鲁棒性。
定性结果显示通过将异常区域重建为接近正常模式来改善异常定位。

Figure 2 : (a) The overall framework of our HVQ-Trans. (b) Each VQ-based Layer replaces continuous features with iconic prototypes, equipped with the POT module to promote better learning and scoring. (c) The codebook and expert network are switched for individual image. (d) The detailed structure o

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。