QUICK REVIEW

[论文解读] A Comprehensive Survey on Machine Learning Techniques and User Authentication Approaches for Credit Card Fraud Detection

Niloofar Yousefi, Marie Alaghband|arXiv (Cornell University)|Dec 2, 2019

Imbalanced Data Classification Techniques被引用 23

一句话总结

本综述全面回顾了用于信用卡欺诈检测的机器学习技术与行为生物识别方法，评估了经典模型与先进用户认证方法。研究发现，在数据稀缺的条件下，结合噪声过滤的随机森林模型在基于打字行为的认证中实现了3.5%的低等错误率（EER），优于深度学习模型。

ABSTRACT

With the increase of credit card usage, the volume of credit card misuse also has significantly increased. As a result, financial organizations are working hard on developing and deploying credit card fraud detection methods, in order to adapt to ever-evolving, increasingly sophisticated defrauding strategies and identifying illicit transactions as quickly as possible to protect themselves and their customers. Compounding on the complex nature of such adverse strategies, credit card fraudulent activities are rare events compared to the number of legitimate transactions. Hence, the challenge to develop fraud detection that are accurate and efficient is substantially intensified and, as a consequence, credit card fraud detection has lately become a very active area of research. In this work, we provide a survey of current techniques most relevant to the problem of credit card fraud detection. We carry out our survey in two main parts. In the first part,we focus on studies utilizing classical machine learning models, which mostly employ traditional transnational features to make fraud predictions. These models typically rely on some static physical characteristics, such as what the user knows (knowledge-based method), or what he/she has access to (object-based method). In the second part of our survey, we review more advanced techniques of user authentication, which use behavioral biometrics to identify an individual based on his/her unique behavior while he/she is interacting with his/her electronic devices. These approaches rely on how people behave (instead of what they do), which cannot be easily forged. By providing an overview of current approaches and the results reported in the literature, this survey aims to drive the future research agenda for the community in order to develop more accurate, reliable and scalable models of credit card fraud detection.

研究动机与目标

提供针对信用卡欺诈检测中机器学习与用户认证技术的全面综述。
分析基于交易特征的经典机器学习模型在检测欺诈交易方面的有效性。
评估行为生物识别技术（如打字动力学与触摸交互）在用户认证与欺诈防范中的表现。
识别当前基于实验室的评估方法的局限性及其与实际部署性能之间的差距。
探索生成合成数据在提升异常检测模型鲁棒性方面的潜力。

提出的方法

对应用于信用卡欺诈检测的监督与非监督机器学习模型进行了文献调研。
在CMU打字动力学数据集上评估了监督学习方法（包括随机森林与深度神经网络）。
通过移除每个用户均值向量3个标准差以外的数据点，实施噪声减少以提升模型鲁棒性。
将等错误率（EER）作为主要性能指标，用于比较各类认证模型。
探索了通过保留真实数据统计特性的方法生成合成生物识别数据的可行性，以应对数据稀缺问题。
提出了结合触摸、运动、方向与手势数据的多模态行为特征集，以增强认证能力。

实验结果

研究问题

RQ1经典机器学习模型与先进行为生物识别系统在信用卡欺诈检测中的表现如何比较？
RQ2数据预处理（尤其是噪声过滤）对用户认证模型性能有何影响？
RQ3为何行为生物识别系统在实验室环境下的评估结果往往无法推广到真实世界部署？
RQ4具有真实统计特性的合成生物识别数据能否提升异常检测模型的鲁棒性？
RQ5在训练数据有限的情况下，随机森林与深度神经网络哪种机器学习算法表现更优？

主要发现

当结合噪声过滤时，随机森林模型实现了约3.5%的最低等错误率（EER），优于深度学习模型。
深度学习模型表现欠佳，原因在于参数量过高且训练数据不足，凸显数据稀缺是关键限制因素。
通过移除每个用户均值向量3个标准差以外的数据点进行噪声过滤，显著降低了所有模型的EER。
基于实验室的评估通常高估性能；真实世界数据中EER显著高于受控环境下的报告值。
结合多种行为模态（如触摸、运动、手势）相比单模态系统，有潜力进一步降低错误率。
通过保留统计特性的方法生成合成数据，可使异常检测模型的训练更具鲁棒性，尤其适用于罕见欺诈模式的建模。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。