QUICK REVIEW

[论文解读] X-MAP: eXplainable Misclassification Analysis and Profiling for Spam and Phishing Detection

Qi Zhang, Dian Chen|arXiv (Cornell University)|Feb 17, 2026

Spam and Phishing Detection被引用 0

一句话总结

X-MAP 将 SHAP 解释与非负矩阵分解相结合，创建可靠分类消息的主题级轮廓，并使用 Jensen–Shannon 散度来检测并修正垃圾邮件/钓鱼检测中的错误分类。

ABSTRACT

Misclassifications in spam and phishing detection are very harmful, as false negatives expose users to attacks while false positives degrade trust. Existing uncertainty-based detectors can flag potential errors, but possibly be deceived and offer limited interpretability. This paper presents X-MAP, an eXplainable Misclassification Analysis and Profilling framework that reveals topic-level semantic patterns behind model failures. X-MAP combines SHAP-based feature attributions with non-negative matrix factorization to build interpretable topic profiles for reliably classified spam/phishing and legitimate messages, and measures each message's deviation from these profiles using Jensen-Shannon divergence. Experiments on SMS and phishing datasets show that misclassified messages exhibit at least two times larger divergence than correctly classified ones. As a detector, X-MAP achieves up to 0.98 AUROC and lowers the false-rejection rate at 95% TRR to 0.089 on positive predictions. When used as a repair layer on base detectors, it recovers up to 97% of falsely rejected correct predictions with moderate leakage. These results demonstrate X-MAP's effectiveness and interpretability for improving spam and phishing detection.

研究动机与目标

Motivate reducing harmful misclassifications in spam/phishing detection where false negatives and false positives have serious costs.
Develop an explainable framework that identifies semantic patterns behind model failures at the topic level.
Create group-level profiles for reliably classified messages and quantify each message’s deviation from these profiles.
Demonstrate X-MAP as both a standalone detector and as a repair layer to improve existing uncertainty-based detectors.

提出的方法

Compute SHAP values for each feature per message and separate positive (spam/phishing) and negative (legitimate) contributions.
Select top SHAP features per class using a ranking score that combines conditional mean contribution and feature presence.
Apply nonnegative matrix factorization (NMF) to SHAP matrices to derive interpretable topics and assign features to topics.
Construct group-level topic profiles for true positives (TP) and true negatives (TN) and normalize to obtain reliable topic distributions.
Measure each message’s topic distribution against the corresponding reliable group profile using Jensen–Shannon divergence to obtain a misclassification score.
Optionally use X-MAP as a repair layer by re-evaluating messages rejected by uncertainty-based detectors and re-accepting those aligning with TP/TN profiles.

实验结果

研究问题

RQ1How can misclassifications in spam/phishing detection be explained in a human-interpretable, topic-level manner?
RQ2Do SHAP-based topic patterns differ between correctly classified and misclassified messages, and can they be used to detect misclassifications?
RQ3Can topic-based misclassification signals complement or improve existing uncertainty-based detectors, including as a repair layer?

主要发现

Misclassified messages show substantially larger Jensen–Shannon divergence from reliable topic profiles than correctly classified ones (often 2× to 10×).
X-MAP achieves up to 0.98 AUROC as a detector and reduces the false-rejection rate at 95% true rejection rate to about 0.089 on positive predictions.
As a repair layer on top of base uncertainty detectors, X-MAP recovers a substantial fraction of falsely rejected correct predictions with moderate leakage (e.g., around 15% for certain setups).
Topic-level aleatoric uncertainty often yields the best performance for positive predictions, capturing ambiguity among suspicious topics while remaining less biased by spammy features.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。