QUICK REVIEW

[论文解读] VISER: Visually-Informed System for Enhanced Robustness in Open-Set Iris Presentation Attack Detection

Byron Dowling, Eleanor Frederick|arXiv (Cornell University)|Mar 18, 2026

Biometric Identification and Security被引用 0

一句话总结

VISER 研究哪种人类显著性类型（手工注释、眼动追踪、分割）和基础模型嵌入最能改善开放集虹膜 PAD，在去噪的初始眼动追踪显著性上相对于交叉熵基线取得最大增益。

ABSTRACT

Human perceptual priors have shown promise in saliency-guided deep learning training, particularly in the domain of iris presentation attack detection (PAD). Common saliency approaches include hand annotations obtained via mouse clicks and eye gaze heatmaps derived from eye tracking data. However, the most effective form of human saliency for open-set iris PAD remains underexplored. In this paper, we conduct a series of experiments comparing hand annotations, eye tracking heatmaps, segmentation masks, and DINOv2 embeddings to a state-of-the-art deep learning-based baseline on the task of open-set iris PAD. Results for open-set PAD in a leave-one-attack-type out paradigm indicate that denoised eye tracking heatmaps show the best generalization improvement over cross entropy in terms of Area Under the ROC curve (AUROC) and Attack Presentation Classification Error Rate (APCER) at Bona Fide Presentation Classification Error Rate (BPCER) of 1%. Along with this paper, we offer trained models, code, and saliency maps for reproducibility and to facilitate follow-up research efforts.

研究动机与目标

评估哪些形式的人类显著性（手工注释、眼动追踪、分割）在开放集虹膜 PAD 中提供最佳泛化能力。
将基于人类显著性的模型与基础模型嵌入在开放集虹膜 PAD 中进行比较。
提出一种用于眼动追踪热图的去噪方法，以改进显著性引导的训练。
提供数据、代码与显著性地图以支持可重复性和后续研究。

提出的方法

以 DenseNet-121 为骨干网络的开放集虹膜 PAD 模型 D-NetPAD。
采用带有显著性损失的交叉熵损失进行训练，使模型 CAM 与目标显著性对齐（参见 Boyd 等人）。
评估多种显著性模态：分割掩码、不同熵水平的手工注释、眼动追踪热图（完整阶段与初始阶段）以及使用 HDBSCAN 的去噪变体。
测试基础模型嵌入（DINOv2-Base）并使用分类器：逻辑回归、SVM（线性）、SVM（RBF）。
在 AUROC 与 BPCER=1% 的 APCER 下评估性能，并报告相对于 XENT 基线的提升。

Figure 2 : Example of applying HDBSCAN to de-noise the eye tracking data. Clusters made up of valid fixations are distinguished by color and larger marks denote longer fixations, not fixation area. Black crosses indicate fixations marked as noise excluded from the final saliency map.

实验结果

研究问题

RQ1眼动追踪显著性在未见攻击类型上的泛化是否优于手工注释或分割，在开放集虹膜 PAD 中？
RQ2显著性引导的训练方法是否优于使用基础模型嵌入的现代 PAD 解决方案在开放集虹膜 PAD 的性能？
RQ3对眼动注视图的去噪是否能提升显著性引导训练的性能？

主要发现

Method	Printout	Diseased	Post Mortem	Synthetic	Contacts + Print	Textured Contact	Artificial	Average Δ
Baseline (DenseNet XENT)	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
Segmentation Masks \| Baseline Saliency-based	+0.0087	-0.0052	+0.0385	-0.0511	-0.0212	+0.0345	+0.0232	+0.0039
Hand Annotations \| High Entropy	-0.0184	+0.0520	+0.0073	-0.0712	-0.0123	-0.0050	+0.0175	-0.0043
Hand Annotations \| Equal Entropy	-0.0010	+0.0847	-0.0098	-0.0122	-0.0021	+0.0274	+0.0277	+0.0164
Hand Annotations \| Low Entropy	-0.0001	-0.0077	+0.0294	-0.0160	-0.0384	+0.0140	-0.0157	-0.0049
Eye Tracking \| Full
+0.0473	+0.0282	-0.0468	+0.0941	-0.0155	+0.0474	+0.2387	+0.0562
Eye Tracking \| Initial
+0.0513	+0.0192	-0.0155	+0.0236	-0.0481	+0.0650	+0.2672	+0.0518
De-noised Full ET	+0.0604	+0.0345	-0.0820	+0.0621	-0.0782	+0.0597	+0.2438	+0.0429
De-noised Initial ET	+0.0627	+0.0574	-0.0645	+0.1090	-0.0453	+0.0372	+0.2692	+0.0608
Foundation Model (No Saliency) \| DINOv2 + LogReg	-0.0326	-0.1079	+0.0378	-0.0340	+0.0086	-0.0327	+0.1520	-0.0013
Foundation Model (No Saliency) \| DINOv2 + SVM-Linear	-0.0523	-0.1247	-0.0033	-0.0613	+0.0017	-0.0351	+0.1737	-0.0145
Foundation Model (No Saliency) \| DINOv2 + SVM-RBF	-0.0104	+0.0023	+0.1796	+0.1121	-0.0159	-0.0448	+0.1167	+0.0485

眼动追踪显著性，特别是去噪的初始眼动追踪，在相对于 XENT 基线的 AUROC 提升最大（约 +0.061），在 BPCER 1% 时的 APCER 提升也最大（约 +0.1063）。
包含去噪的初始眼动追踪在多种攻击类别下的表现优于其他显著性类型与基线；分割掩码与手工注释在 APCER@BPCER1% 上通常未超越基线。
带有 DINOv2 的基础模型结果参差不齐；只有 DINOv2+SVM-RBF 能适度提升 AUROC，其他变体未稳定超越基线。
去噪对初始眼动追踪的收益显著（AUROC +0.0608，APCER Δ +0.1063），但对完整眼动追踪的收益不那么稳定（AUROC +0.0429，APCER Δ +0.0857）。
总体而言，在所测试的设置中，基于眼动追踪的显著性引导训练在开放集虹膜 PAD 中优于基于基础模型的方法。

(a) Segmentation Mask of the Iris Region

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。