QUICK REVIEW

[논문 리뷰] Ensemble Methodology:Innovations in Credit Default Prediction Using LightGBM, XGBoost, and LocalEnsemble

Mengran Zhu, Ye Zhang|arXiv (Cornell University)|2024. 02. 28.

Financial Distress and Bankruptcy Prediction인용 수 20

한 줄 요약

본 논문은 LightGBM, XGBoost, LocalEnsemble을 결합한 Ensemble Methods 프레임워크를 제시하여 신용카드 채무불이행 예측 정확도를 높이고, American Express 데이터셋에서 검증했다. 이 앙상블은 공개 및 비공개 평가에서 개별 모델보다 우수하다.

ABSTRACT

In the realm of consumer lending, accurate credit default prediction stands as a critical element in risk mitigation and lending decision optimization. Extensive research has sought continuous improvement in existing models to enhance customer experiences and ensure the sound economic functioning of lending institutions. This study responds to the evolving landscape of credit default prediction, challenging conventional models and introducing innovative approaches. By building upon foundational research and recent innovations, our work aims to redefine the standards of accuracy in credit default prediction, setting a new benchmark for the industry. To overcome these challenges, we present an Ensemble Methods framework comprising LightGBM, XGBoost, and LocalEnsemble modules, each making unique contributions to amplify diversity and improve generalization. By utilizing distinct feature sets, our methodology directly tackles limitations identified in previous studies, with the overarching goal of establishing a novel standard for credit default prediction accuracy. Our experimental findings validate the effectiveness of the ensemble model on the dataset, signifying substantial contributions to the field. This innovative approach not only addresses existing obstacles but also sets a precedent for advancing the accuracy and robustness of credit default prediction models.

연구 동기 및 목표

소비자 대출에서의 신용불이행 예측의 도전 과제에 대응한다.
일반화와 강건성을 향상시키는 다양한 앙상블 프레임워크를 개발한다.
독특한 피처 세트 및 로컬 앙상블 기법을 활용하여 정확도를 향상시킨다.
대규모 익명화된 American Express 데이터셋에서 효과를 입증한다.

제안 방법

데이터에서 노이즈 제거, 타입 변환, 이상치 처리로 전처리한다.
집계, 시차 피처, 메타 피처를 포함한 피처를 엔지니어링한다.
다양성을 촉진하기 위해 서로 다른 피처 세트를 사용해 세 모듈(LightGBM, XGBoost, LocalEnsemble)을 학습한다.
초기 모델의 교차 검증 아웃풋을 메타 피처로 도입한다.
모듈 예측을 가중 앙상블(y_hat_e = sum w_i * y_hat_i)로 결합한다.
정규화된 지니 계수와 4% 디폴트율을 결합한 복합 지표로 평가한다.

실험 결과

연구 질문

RQ1LightGBM, XGBoost, LocalEnsemble의 앙상블이 개별 모델보다 더 높은 신용불이행 예측 정확도를 달성할 수 있는가?
RQ2다양한 피처 세트와 LocalEnsemble 구성요소가 대규모 시계열 신용 데이터에서 일반화와 강건성을 개선하는가?
RQ3제안된 앙상블 모델이 미국 대표 데이터셋의 공개·비공개 하위집합에서 신경망 및 다른 부스팅 모델과 비교해 어떠한 성능을 보이는가?

주요 결과

모델	공개 점수 (49%)	비공개 점수 (51%)
GRU	0.78877	0.79832
Transformer	0.78916	0.79832
Tabtransformer	0.78271	0.79236
Neural Networks	0.78705	0.79698
XGBoost	0.79982	0.80757
LightGBM	0.80006	0.80809
CatBoost (Local)	0.79804	0.80629
LightGBM (Local)	0.79967	0.80697
Local Ensemble	0.80094	0.80842
Ensemble Model	0.80128	0.80872

앙상블 모델은 공개 데이터세트에서 0.80128, 비공개 데이터세트에서 0.80872로 가장 높은 점수를 달성했다.
LocalEnsemble 및 LightGBM+XGBoost 구성요소가 피처 다양성을 통해 성능 향상에 기여했다.
XGBoost와 LightGBM이 강력한 베이스라인 성능을 제공하고, LocalEnsemble이 일반화를 더욱 향상시켰다.
피처 중요도 분석에서 상위 피처들이 XGBoost와 LightGBM의 예측력의 90% 이상을 설명한다.
세 가지 모듈의 제안된 융합이 연구에서 여러 딥러닝 및 전통 모델보다 우수한 성능을 보였다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.