[论文解读] X-MAP: eXplainable Misclassification Analysis and Profiling for Spam and Phishing Detection
X-MAP 将 SHAP 解释与非负矩阵分解相结合,创建可靠分类消息的主题级轮廓,并使用 Jensen–Shannon 散度来检测并修正垃圾邮件/钓鱼检测中的错误分类。
Misclassifications in spam and phishing detection are very harmful, as false negatives expose users to attacks while false positives degrade trust. Existing uncertainty-based detectors can flag potential errors, but possibly be deceived and offer limited interpretability. This paper presents X-MAP, an eXplainable Misclassification Analysis and Profilling framework that reveals topic-level semantic patterns behind model failures. X-MAP combines SHAP-based feature attributions with non-negative matrix factorization to build interpretable topic profiles for reliably classified spam/phishing and legitimate messages, and measures each message's deviation from these profiles using Jensen-Shannon divergence. Experiments on SMS and phishing datasets show that misclassified messages exhibit at least two times larger divergence than correctly classified ones. As a detector, X-MAP achieves up to 0.98 AUROC and lowers the false-rejection rate at 95% TRR to 0.089 on positive predictions. When used as a repair layer on base detectors, it recovers up to 97% of falsely rejected correct predictions with moderate leakage. These results demonstrate X-MAP's effectiveness and interpretability for improving spam and phishing detection.
研究动机与目标
- Motivate reducing harmful misclassifications in spam/phishing detection where false negatives and false positives have serious costs.
- Develop an explainable framework that identifies semantic patterns behind model failures at the topic level.
- Create group-level profiles for reliably classified messages and quantify each message’s deviation from these profiles.
- Demonstrate X-MAP as both a standalone detector and as a repair layer to improve existing uncertainty-based detectors.
提出的方法
- Compute SHAP values for each feature per message and separate positive (spam/phishing) and negative (legitimate) contributions.
- Select top SHAP features per class using a ranking score that combines conditional mean contribution and feature presence.
- Apply nonnegative matrix factorization (NMF) to SHAP matrices to derive interpretable topics and assign features to topics.
- Construct group-level topic profiles for true positives (TP) and true negatives (TN) and normalize to obtain reliable topic distributions.
- Measure each message’s topic distribution against the corresponding reliable group profile using Jensen–Shannon divergence to obtain a misclassification score.
- Optionally use X-MAP as a repair layer by re-evaluating messages rejected by uncertainty-based detectors and re-accepting those aligning with TP/TN profiles.
实验结果
研究问题
- RQ1How can misclassifications in spam/phishing detection be explained in a human-interpretable, topic-level manner?
- RQ2Do SHAP-based topic patterns differ between correctly classified and misclassified messages, and can they be used to detect misclassifications?
- RQ3Can topic-based misclassification signals complement or improve existing uncertainty-based detectors, including as a repair layer?
主要发现
- Misclassified messages show substantially larger Jensen–Shannon divergence from reliable topic profiles than correctly classified ones (often 2× to 10×).
- X-MAP achieves up to 0.98 AUROC as a detector and reduces the false-rejection rate at 95% true rejection rate to about 0.089 on positive predictions.
- As a repair layer on top of base uncertainty detectors, X-MAP recovers a substantial fraction of falsely rejected correct predictions with moderate leakage (e.g., around 15% for certain setups).
- Topic-level aleatoric uncertainty often yields the best performance for positive predictions, capturing ambiguity among suspicious topics while remaining less biased by spammy features.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。