QUICK REVIEW

[논문 리뷰] A Comprehensive Survey on Machine Learning Techniques and User Authentication Approaches for Credit Card Fraud Detection

Niloofar Yousefi, Marie Alaghband|arXiv (Cornell University)|2019. 12. 02.

Imbalanced Data Classification Techniques인용 수 23

한 줄 요약

이 종합 검토는 신용카드 사기 탐지에 대한 기계학습 기법과 행동 생체인식 기술을 종합적으로 검토하며, 고전적 모델과 고급 사용자 인증 방법을 평가한다. 데이터 부족으로 인해 딥 러닝보다 랜덤 포레스트 모델에 노이즈 필터링을 적용할 경우 키보드 입력 기반 인증에서 3.5%의 낮은 동등 오류율(EER)을 달성함을 확인하였다.

ABSTRACT

With the increase of credit card usage, the volume of credit card misuse also has significantly increased. As a result, financial organizations are working hard on developing and deploying credit card fraud detection methods, in order to adapt to ever-evolving, increasingly sophisticated defrauding strategies and identifying illicit transactions as quickly as possible to protect themselves and their customers. Compounding on the complex nature of such adverse strategies, credit card fraudulent activities are rare events compared to the number of legitimate transactions. Hence, the challenge to develop fraud detection that are accurate and efficient is substantially intensified and, as a consequence, credit card fraud detection has lately become a very active area of research. In this work, we provide a survey of current techniques most relevant to the problem of credit card fraud detection. We carry out our survey in two main parts. In the first part,we focus on studies utilizing classical machine learning models, which mostly employ traditional transnational features to make fraud predictions. These models typically rely on some static physical characteristics, such as what the user knows (knowledge-based method), or what he/she has access to (object-based method). In the second part of our survey, we review more advanced techniques of user authentication, which use behavioral biometrics to identify an individual based on his/her unique behavior while he/she is interacting with his/her electronic devices. These approaches rely on how people behave (instead of what they do), which cannot be easily forged. By providing an overview of current approaches and the results reported in the literature, this survey aims to drive the future research agenda for the community in order to develop more accurate, reliable and scalable models of credit card fraud detection.

연구 동기 및 목표

신용카드 사기 탐지에 적용된 기계학습 및 사용자 인증 기법에 대한 종합적 검토를 제공하는 것.
거래 특성에 기반한 고전적 기계학습 모델의 효과성을 분석하여 사기 거래 탐지 능력을 평가하는 것.
키보드 동작 분석 및 터치 상호작용과 같은 행동 생체인식 기법이 사용자 인증 및 사기 방지에 어떻게 기여하는지 평가하는 것.
현재의 실험실 기반 평가의 한계와 실제 적용 시 성능 격차를 규명하는 것.
합성 데이터 생성 기술이 이상 탐지 모델의 강건성 향상 잠재력을 어떻게 향상시킬 수 있는지 탐색하는 것.

제안 방법

신용카드 사기 탐지에 적용된 지도 학습 및 비지도 학습 기계학습 모델에 대한 문헌 조사 수행.
CMU 키보드 동작 분석 데이터셋을 이용해 지도 학습 방법(랜덤 포레스트 및 딥 뉴럴 네트워크 포함) 평가.
모델 강건성 향상을 위해 각 사용자 평균 벡터에서 3 표준편차를 초과하는 데이터 포인트를 제거하여 노이즈 감소 적용.
모델 간 성능 비교를 위한 주요 평가 지표로 동등 오류율(EER) 사용.
실제 데이터의 통계적 특성을 반영한 합성 생체인식 데이터 생성의 가능성을 탐색하여 데이터 부족 문제 해결.
터치, 이동, 자세, 제스처 데이터를 조합한 다중 모odal 행동 특징 세트 제안으로 인증 성능 향상

실험 결과

연구 질문

RQ1고전적 기계학습 모델과 고급 행동 생체인식 시스템 간의 신용카드 사기 탐지 성능은 어떻게 비교되는가?
RQ2특히 노이즈 필터링을 포함한 데이터 전처리가 사용자 인증 모델의 성능에 어떤 영향을 미치는가?
RQ3왜 행동 생체인식 시스템의 실험실 기반 평가는 종종 실제 환경 구현에 일반화되지 못하는가?
RQ4실제 통계적 특성을 반영한 합성 생체인식 데이터는 이상 탐지 모델의 강건성 향상에 기여할 수 있는가?
RQ5제한된 학습 데이터 조건에서 랜덤 포레스트와 딥 뉴럴 네트워크 중 어떤 기계학습 알고리즘이 더 우수한 성능을 보이는가?

주요 결과

노이즈 필터링을 적용한 랜덤 포레스트 모델이 키보드 입력 기반 인증에서 약 3.5%의 가장 낮은 동등 오류율(EER)을 기록하여 딥 러닝 모델을 능가함.
딥 러닝 모델은 파rameter 수가 많고 학습 데이터가 부족하여 성능이 열등하여, 데이터 부족이 핵심 제약 요소로 드러남.
각 사용자 평균에서 3 표준편차를 초과하는 데이터 포인트를 제거하는 노이즈 필터링이 모든 모델에서 EER를 크게 감소시킴.
실험실 기반 평가에서는 성능을 과대평가하는 경향이 있으며, 실제 환경 데이터에서는 제어된 환경에서 보고된 것보다 훨씬 높은 EER를 기록함.
터치, 이동, 제스처와 같은 다양한 행동 모달리티를 조합하면 단일 모달리티 시스템 대비 오류율을 감소시킬 잠재력이 있음.
실제 데이터의 통계적 특성을 유지한 합성 데이터 생성은 특히 드문 사기 패턴에 대해 더 강건한 학습을 가능하게 하여 이상 탐지 모델의 강건성 향상에 기여할 수 있음.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.