QUICK REVIEW

[論文レビュー] A Comprehensive Survey on Machine Learning Techniques and User Authentication Approaches for Credit Card Fraud Detection

Niloofar Yousefi, Marie Alaghband|arXiv (Cornell University)|Dec 2, 2019

Imbalanced Data Classification Techniques被引用数 23

ひとこと要約

本調査は、クレジットカード不正検出のための機械学習技術および行動バイオメトリクスを包括的にレビューし、古典的モデルと高度なユーザー認証手法を評価している。データ不足のため、深層学習に比べてノイズフィルタリングを施したランダムフォレストモデルがキーストロークベース認証で3.5％の低等価誤差率（EER）を達成しており、優れた性能を示している。

ABSTRACT

With the increase of credit card usage, the volume of credit card misuse also has significantly increased. As a result, financial organizations are working hard on developing and deploying credit card fraud detection methods, in order to adapt to ever-evolving, increasingly sophisticated defrauding strategies and identifying illicit transactions as quickly as possible to protect themselves and their customers. Compounding on the complex nature of such adverse strategies, credit card fraudulent activities are rare events compared to the number of legitimate transactions. Hence, the challenge to develop fraud detection that are accurate and efficient is substantially intensified and, as a consequence, credit card fraud detection has lately become a very active area of research. In this work, we provide a survey of current techniques most relevant to the problem of credit card fraud detection. We carry out our survey in two main parts. In the first part,we focus on studies utilizing classical machine learning models, which mostly employ traditional transnational features to make fraud predictions. These models typically rely on some static physical characteristics, such as what the user knows (knowledge-based method), or what he/she has access to (object-based method). In the second part of our survey, we review more advanced techniques of user authentication, which use behavioral biometrics to identify an individual based on his/her unique behavior while he/she is interacting with his/her electronic devices. These approaches rely on how people behave (instead of what they do), which cannot be easily forged. By providing an overview of current approaches and the results reported in the literature, this survey aims to drive the future research agenda for the community in order to develop more accurate, reliable and scalable models of credit card fraud detection.

研究の動機と目的

クレジットカード不正検出のための機械学習およびユーザー認証技術について包括的なレビューを提供すること。
取引特徴を用いた古典的機械学習モデルの有効性を分析すること。
キーストロークダイナミクスやタッチインタラクションなどの行動バイオメトリクスを、ユーザー認証および不正防止に用いることの有効性を評価すること。
現在のラボベースの評価における制限および実世界でのパフォーマンスギャップを特定すること。
合成データ生成の可能性を検討し、異常検出モデルの頑健性を向上させること。

提案手法

クレジットカード不正検出に応用された教師ありおよび教師なし機械学習モデルに関する文献調査を実施した。
CMUキーストロークダイナミクスデータセットを用いて、教師あり学習手法（ランダムフォレストおよび深層ニューラルネットワークなど）を評価した。
各ユーザーの平均ベクトルから3標準偏差を超えるデータポイントを削除することで、ノイズ低減を実施し、モデルの頑健性を向上させた。
認証モデルの比較に使用する主なパフォーマンス指標として、等価誤差率（EER）を採用した。
実際のデータの統計的性質を再現する合成バイオメトリクスデータの生成可能性を検討し、データ不足の問題を解決することを目的とした。
タッチ、移動、姿勢、ジェスチャーのデータを統合したマルチモーダル行動特徴セットを提案し、認証の強化を図った。

実験結果

リサーチクエスチョン

RQ1古典的機械学習モデルと高度な行動バイオメトリクスシステムの間で、クレジットカード不正検出において性能に差は生じるか？
RQ2特にノイズフィルタリングを含むデータ前処理が、ユーザー認証モデルのパフォーマンスに与える影響は何か？
RQ3なぜ行動バイオメトリクスシステムのラボベース評価は、実世界の展開において一般化しにくいのか？
RQ4実際の統計的性質を再現する合成バイオメトリクスデータは、異常検出モデルの頑健性を向上させることができるか？
RQ5限られた訓練データ下で、ランダムフォレストと深層ニューラルネットワークのどちらの機械学習アルゴリズムがより優れた性能を示すか？

主な発見

ノイズフィルタリングを施したランダムフォレストモデルは、約3.5％の等価誤差率（EER）を達成し、最も低いEERを記録した。これは、深層学習モデルを上回る性能であった。
深層学習モデルは、パrameter数が多く、訓練データが不足しているため、性能が劣った。これは、データ不足が主な制限要因であることを示している。
各ユーザーの平均ベクトルから3標準偏差を超えるデータポイントを削除することで、ノイズフィルタリングがすべてのモデルのEERを顕著に低減した。
ラボベースの評価は性能を過大評価しがちであり、実世界のデータでは、制御された環境で報告された値よりも顕著に高いEERが観測された。
複数の行動モダリティ（例：タッチ、移動、ジェスチャー）を統合することで、単一モダリティシステムに比べて誤差率を低減する可能性がある。
実際の統計的性質を保持した合成データ生成は、とくにレアな不正パターンの学習を強化するため、異常検出モデルの訓練をより頑健にする可能性を秘めている。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。