[论文解读] Rethinking Reconstruction-based Graph-Level Anomaly Detection: Limitations and a Simple Remedy
本论文分析基于图自编码器的 GLAD 的重构翻转,展示仅均值重构误差作为异常分数的局限性,并提出 MuSE,该方法利用多方面摘要的重构误差,在 10 个数据集上实现了最先进的 GLAD 性能。
Graph autoencoders (Graph-AEs) learn representations of given graphs by aiming to accurately reconstruct them. A notable application of Graph-AEs is graph-level anomaly detection (GLAD), whose objective is to identify graphs with anomalous topological structures and/or node features compared to the majority of the graph population. Graph-AEs for GLAD regard a graph with a high mean reconstruction error (i.e. mean of errors from all node pairs and/or nodes) as anomalies. Namely, the methods rest on the assumption that they would better reconstruct graphs with similar characteristics to the majority. We, however, report non-trivial counter-examples, a phenomenon we call reconstruction flip, and highlight the limitations of the existing Graph-AE-based GLAD methods. Specifically, we empirically and theoretically investigate when this assumption holds and when it fails. Through our analyses, we further argue that, while the reconstruction errors for a given graph are effective features for GLAD, leveraging the multifaceted summaries of the reconstruction errors, beyond just mean, can further strengthen the features. Thus, we propose a novel and simple GLAD method, named MUSE. The key innovation of MUSE involves taking multifaceted summaries of reconstruction errors as graph features for GLAD. This surprisingly simple method obtains SOTA performance in GLAD, performing best overall among 14 methods across 10 datasets.
研究动机与目标
- 研究在何种情况下由于重构翻转,基于重构的 GLAD 方法会失效。
- 从理论和经验角度表征图自编码器在与训练模式相关的未见图上的重构方式。
- 展示仅使用平均重构误差进行 GLAD 的局限性。
- 提出一种简单、鲁棒的 GLAD 方法 (MuSE),其利用重构误差的多方面摘要。
- 展示 MuSE 在多数据集上的广泛经验提升。
提出的方法
- 使用具有主要模式(如社区结构、循环)且强度不同的合成图分析重构翻转现象。
- 在 GLAD 基准测试上进行经验性实验,观察在已见模式与未见模式下的重构行为。
- 使用单层线性 GAE 的理论结果来解释泛化与模式强度的相关性。
- 通过用多方面摘要(如均值、标准差)表示每个图的重构误差来实现 MuSE。
- 在训练阶段进行数据增强及节点特征和邻接矩阵解码器的重构模型训练,使用 L_X(余弦损失)和 L_A(加权 BCE)损失。
- 通过对 L_X 和 L_A 应用多种聚合 Agg_t 得到误差表示 Err(G),并将其输入到一类分类器以进行异常检测。
实验结果
研究问题
- RQ1Graph-AE 在 GLAD 中的重构误差何时会因为重构翻转而无法有效区分异常图?
- RQ2多方面的重构误差摘要是否可以提升 GLAD 相对于仅均值的指标的性能?
- RQ3MuSE 在多样化数据集上的表现相对于最先进的 GLAD 方法如何?
主要发现
| 方法 | DD | Protein | NCI1 | AIDS | IMDB | MUTAG | DHFR | BZR | ER | AR | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| DOMINANT-G | 64.3 (4.4) | 55.9 (9.7) | 65.5 (6.1) | 80.6 (4.0) | 58.6 (5.3) | 60.8 (6.7) | 65.0 (4.2) | 56.6 (9.2) | 76.2 (7.8) | 58.7 (5.5) | 10.7 | |
| OCGTL | 74.5 (5.1) | 71.0 (8.7) | 61.2 (5.5) | 95.3 (3.7) | 69.0 (4.0) | 65.8 (5.8) | 64.9 (4.9) | 66.5 (9.9) | 71.3 (17.1) | 63.0 (3.6) | 6.9 | |
| GLocalKD | 47.8 (8.5) | 50.7 (8.5) | 51.6 (5.6) | 51.2 (1.2) | 49.8 (4.2) | 58.5 (6.7) | 55.1 (4.4) | 54.1 (8.1) | 55.8 (16.7) | 54.4 (4.4) | 17.0 | |
| GLADC | 52.1 (5.2) | 50.7 (5.6) | 51.4 (3.6) | 51.4 (1.0) | 52.2 (2.6) | 57.7 (5.2) | 53.3 (4.5) | 55.8 (4.1) | 59.0 (14.5) | 52.8 (4.2) | 16.8 | |
| GLAM | 61.6 (5.2) | 60.3 (5.6) | 58.1 (1.9) | 93.6 (2.6) | 75.6 (4.0) | 65.1 (3.5) | 63.0 (2.0) | 57.2 (2.7) | 72.6 (8.9) | 55.2 (2.9) | 9.8 | |
| HIMNET | 52.1 (3.7) | 56.9 (5.8) | 53.6 (4.6) | 64.3 (3.2) | 65.7 (2.4) | 61.8 (4.3) | 57.5 (2.9) | 63.6 (6.7) | 72.0 (9.9) | 55.7 (2.8) | 12.3 | |
| SIGNET | 64.2 (9.3) | 56.4 (6.4) | 63.1 (4.0) | 97.2 (1.6) | 78.0 (4.4) | 48.2 (4.8) | 67.5 (1.6) | 40.2 (5.8) | 66.6 (9.5) | 56.2 (4.3) | 10.4 | |
| SSL-based | GraphCL-1 | 64.5 (3.9) | 60.7 (4.2) | 55.8 (3.1) | 71.2 (6.6) | 57.7 (5.5) | 54.2 (6.2) | 53.6 (2.3) | 57.8 (6.7) | 60.5 (9.3) | 55.5 (4.1) | 14.2 |
| GraphMAE-1 | 64.7 (5.2) | 61.3 (7.0) | 62.5 (2.2) | 86.2 (1.4) | 74.8 (3.2) | 63.8 (7.4) | 63.2 (3.3) | 56.5 (9.6) | 68.5 (13.7) | 60.0 (3.9) | 10.3 | |
| GraphCL-2 | 66.1 (3.0) | 59.1 (5.2) | 60.3 (4.4) | 91.8 (3.5) | 77.3 (4.1) | 66.3 (5.6) | 67.4 (3.3) | 59.1 (4.6) | 71.9 | 10.4 | 67.3 (3.4) | 7.2 |
| GAE-2 | 67.2 (3.4) | 62.3 (5.0) | 62.4 (3.9) | 85.8 (1.6) | 75.3 (5.7) | 66.6 (7.6) | 67.3 (3.3) | 60.8 (5.6) | 72.0 (8.8) | 65.7 (2.0) | 7.0 | |
| GraphMAE-2 | 68.0 (4.3) | 61.2 (4.0) | 68.3 (3.6) | 90.8 (3.6) | 75.8 (4.8) | 66.7 (5.8) | 68.1 (2.4) | 61.4 (6.0) | 72.8 (6.4) | 66.2 (6.4) | 5.1 | |
| MuSE w/o L_X | 79.4 (3.7) | 75.6 (3.7) | 69.2 (3.7) | 99.6 (0.5) | 72.2 (4.0) | 65.8 (5.7) | 65.8 (3.1) | 60.4 (6.6) | 65.6 (19.4) | 66.3 (3.6) | 5.8 | |
| MuSE w/o L_A | 61.8 (7.6) | 64.7 (7.1) | 63.1 (3.3) | 89.3 (2.8) | 72.0 (4.8) | 56.9 (7.1) | 57.0 (3.5) | 58.1 (3.1) | 68.7 (14.2) | 60.7 (4.0) | 11.0 | |
| MuSE w/o AVG | 78.6 (4.0) | 68.1 (5.5) | 68.0 (2.0) | 95.0 (2.6) | 73.2 (6.6) | 66.2 (6.5) | 60.9 (3.9) | 60.1 (2.4) | 62.0 (3.5) | 7.7 | ||
| MuSE w/o STD | 74.3 (5.4) | 74.4 (5.2) | 65.2 (3.6) | 98.7 (0.5) | 70.5 (4.3) | 70.7 (3.7) | 62.0 (2.4) | 62.9 (6.4) | 71.3 (11.5) | 66.7 (2.4) | 5.6 | |
| MuSE | 80.5 (2.3) | 78.4 (2.2) | 71.1 (2.0) | 99.7 (0.5) | 78.4 (5.7) | 69.2 (3.5) | 67.5 (3.4) | 63.8 (8.6) | 69.5 (12.6) | 67.9 (3.6) | 2.2 |
- 当训练图具有相同的主要模式但强度更高时,重构翻转往往发生。
- 未见图具有不同的主要模式时,往往产生更高的重构误差,从而缓解翻转。
- 仅使用平均重构误差可能错排异常和正常图;误差分布包含判别信息(如形状不同)。
- MuSE(使用多方面摘要的重构误差)在 10 个数据集上达到最先进的 GLAD 性能,相较基线有显著提升。
- MuSE 在 18 个基线中显示出较强的平均排名,消融实验表明 X 重构损失、A 重构损失,以及均值与 STD 分量的重要性。
- 在 10 个数据集上,MuSE 在某些设置中相较最佳竞争方法获得高达 28.1% 的 AUROC 提升。
- 方法在分量被消融时仍保持竞争力,突出多方面误差表示的贡献。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。