QUICK REVIEW

[论文解读] Analyzing Machine Learning Models for Credit Scoring with Explainable AI and Optimizing Investment Decisions

Swati Tyagi|arXiv (Cornell University)|Sep 19, 2022

Financial Distress and Bankruptcy Prediction被引用 24

一句话总结

该论文比较多种机器学习模型用于信用评分，应用 LIME 和 SHAP 进行可解释性分析，并研究基于 ML 的投资策略以在降低风险的前提下最大化盈利。

ABSTRACT

This paper examines two different yet related questions related to explainable AI (XAI) practices. Machine learning (ML) is increasingly important in financial services, such as pre-approval, credit underwriting, investments, and various front-end and back-end activities. Machine Learning can automatically detect non-linearities and interactions in training data, facilitating faster and more accurate credit decisions. However, machine learning models are opaque and hard to explain, which are critical elements needed for establishing a reliable technology. The study compares various machine learning models, including single classifiers (logistic regression, decision trees, LDA, QDA), heterogeneous ensembles (AdaBoost, Random Forest), and sequential neural networks. The results indicate that ensemble classifiers and neural networks outperform. In addition, two advanced post-hoc model agnostic explainability techniques - LIME and SHAP are utilized to assess ML-based credit scoring models using the open-access datasets offered by US-based P2P Lending Platform, Lending Club. For this study, we are also using machine learning algorithms to develop new investment models and explore portfolio strategies that can maximize profitability while minimizing risk.

研究动机与目标

激励在金融机器学习应用中的可解释 AI 的使用，如信贷核保和投资决策。
评估从简单分类器到集成方法和神经网络的多种 ML 模型用于信用评分。
评估在信用评分模型上的事后可解释性方法（LIME, SHAP）。
开发基于 ML 的投资模型和投资组合策略，以在最大化盈利的同时最小化风险。
强调开放获取的 Lending Club 数据在模型评估中的适用性。

提出的方法

比较单一分类器（逻辑回归、决策树、LDA、QDA）、异质集成（AdaBoost、Random Forest）以及序列神经网络在信用评分任务上的表现。
应用事后无关的可解释性技术（LIME 和 SHAP）来评估模型解释。
使用 Lending Club 的开放获取数据集来评估信用评分模型。
开发并测试以盈利能力和风险最小化为目标的基于 ML 的投资模型和投资组合策略。
报告对模型表现和可解释性的定性与定量观察。

实验结果

研究问题

RQ1哪些 ML 模型在 Lending Club 数据上的信用评分表现最好？
RQ2在这一设置中，集成方法和神经网络与单一分类器相比如何？
RQ3LIME 和 SHAP 在解释信用评分模型方面有多有效？
RQ4与基线策略相比，基于 ML 的投资模型是否能在提高盈利的同时降低风险？

主要发现

集成分类器和神经网络在信用评分中优于单一分类器。
LIME 和 SHAP 被用于评估信用评分模型的可解释性。
研究使用 Lending Club 的开放获取数据集进行评估。
本工作还开发基于 ML 的投资模型和投资组合策略，以在考虑风险的情况下优化盈利能力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。