QUICK REVIEW

[论文解读] Pistol: Pupil Invisible Supportive Tool to extract Pupil, Iris, Eye Opening, Eye Movements, Pupil and Iris Gaze Vector, and 2D as well as 3D Gaze

Wolfgang Fuhl, Daniel Weber|arXiv (Cornell University)|Jan 18, 2022

Gaze Tracking and Assistive Technology被引用 26

一句话总结

Pistol 是一个特征提取和凝视估计工具包，处理 Pupil Invisible 数据（离线）以提取瞳孔/虹膜几何、眼睛开合、眼球运动、标记，以及使用 Levenberg–Marquardt 拟合和神经网络的 2D/3D 注视。

ABSTRACT

This paper describes a feature extraction and gaze estimation software, named extit{Pistol} that can be used with Pupil Invisible projects and other eye trackers in the future. In offline mode, our software extracts multiple features from the eye including, the pupil and iris ellipse, eye aperture, pupil vector, iris vector, eye movement types from pupil and iris velocities, marker detection, marker distance, 2D gaze estimation for the pupil center, iris center, pupil vector, and iris vector using Levenberg Marquart fitting and neural networks. The gaze signal is computed in 2D for each eye and each feature separately and for both eyes in 3D also for each feature separately. We hope this software helps other researchers to extract state-of-the-art features for their research out of their recordings. Link: https://es-cloud.cs.uni-tuebingen.de/d/8e2ab8c3fdd444e1a135/?p=%2FPISTOL&mode=list

研究动机与目标

提供一个免费工具，从 pupil-invisible 记录中提取广泛的眼部特征。
实现每只眼的 2D 注视估计，以及双眼的 3D 注视估计，使用多种优化方法。
提供对瞳孔/虹膜标志点、眼睑、眼开合和眼球运动的鲁棒检测，以支持研究和产业应用。
支持与未来眼动追踪设备的兼容以及与 Pupil Invisible 项目的集成。

提出的方法

使用带张量归一化和全分布训练的小型深度神经网络来检测瞳孔、虹膜和眼睑标志点。
用 OpenCV 椭圆拟合拟合瞳孔/虹膜椭圆， eyelids 采用三次样条拟合。
通过一个受约束的优化来估计眼睑开启程度，并且向 eye-corners 向量正交约束。
使用神经网络计算眼球半径和中心，并从中心到中心向量推导光学向量。
使用神经网络进行眼球运动分类，利用瞳孔向量与虹膜向量之间的夹角以及开启差异。
使用粗/细两个 DNN 探测器检测校准标记，以支持深度和注视标定。

实验结果

研究问题

RQ1Pistol 在 pupil-invisible 数据中提取全面眼部特征（瞳孔/虹膜椭圆、眼睑、眼球、向量）的效果如何？
RQ2在 per-eye 与 combined-eye 配置下，使用 Levenberg–Marquardt 拟合与神经网络的 2D/3D 注视估计准确度如何？
RQ3在不同头部/眼部姿态下，基于标记的校准在深度估计与注视计算中的可靠性如何？
RQ4在旧款 GPU 上进行离线实时处理的运行时特性和硬件要求是什么？

主要发现

Pistol 实现瞳孔中心 RMSE 约 0.93 px 和虹膜标志点 RMSE 约 1.12 px，在注释数据上，瞳孔/虹膜眼睑的 IoU 分别约 0.84–0.91（表 3）。
2D 注视估计在方法和眼睛不同情况下平均在 21.8–32.3 px 之间，双眼 2D 估计约在 18.5–22.9 px（表 5）。
3D 注视估计各方法的平均误差范围约 ~19.6–20.5 px（表 5）。
标记检测粗略阶段平均地标准确度为 6.70 px，误检率为 2.34%；精细检测器将准确度降至 0.82 px，误检率为 0.001%（表 7）。
Pistol 在 GTX 1050Ti 上的每帧特征检测约 ~17.21 ms，标记检测约每帧 ~57 ms（表 8）。
使用 KNN 方法的深度估计在大多数误差低于 50 cm 的情况下具有实际可用性，近场深度仍具有挑战性（图 10）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。