The Ensemble Supervised Machine Learning for Credit Scoring Model in Digital Banking Institution

Narita Ayu Prahastiwi; Muharman Lubis; Hanif Fakhrurroja

doi:10.24014/ijaidm.v8i2.37677

The Ensemble Supervised Machine Learning for Credit Scoring Model in Digital Banking Institution

Narita Ayu Prahastiwi, Muharman Lubis, Hanif Fakhrurroja

Abstract

The digital transformation of the banking industry requires credit scoring systems that are both accurate and adaptable to complex, diverse data. This study aims to develop and evaluate a credit scoring model using ensemble supervised learning to predict credit risk for a consumer loan service (Product X) at Bank XYZ. Ensemble algorithms such as Random Forest, AdaBoost, LightGBM, CatBoost, and XGBoost were compared to a single classification method, Decision Tree. Model performance was assessed using precision, recall, F1-score, and ROC-AUC. The results show that XGBoost outperformed other models, achieving the highest ROC-AUC score of 0.803, indicating strong generalization and low risk of overfitting. SHAP analysis revealed key features influencing the model, including loan tenor, loan amount (plafond), income, and Days Past Due (DPD) history. Compared to the baseline Decision Tree model (ROC-AUC 0.573), XGBoost significantly improved classification accuracy. It also showed the potential to reduce the Non-Performing Loan (NPL) rate from 4% to below 3% and increase the approval rate from 65% to over 70%, aligning with Product X’s KPIs. These findings confirm that ensemble learning models especially XGBoost offer strategic value in enhancing credit portfolio quality and decision-making in digital banking.

Keywords

Credit Scoring; Ensemble Supervised Learning; Machine Learning; Non Performing Loan; XGBoost

Full Text:

PDF

References

Badan Pusat Statistik Bank Indonesia, “Percepatan Digitalisasi Transaksi untuk Memacu Pemulihan Ekonomi Nasional,” https://www.bi.go.id/id/publikasi/ruang-media/news-release/Pages/sp_2620624.aspx.

Digibank by DBS, “Pinjaman Digibank KTA,” https://www.dbs.id/digibank/id/id/pinjaman/produk-pinjaman/.

BTPN Syariah, “Produk & Layanan Pembiayaan BTPN Syariah,” https://www.btpnsyariah.com/pembiayaan.

Badan Pembina Hukum Nasional, “UNDANG-UNDANG REPUBLIK INDONESIA NOMOR 10 TAHUN 1998,” 1998. [Online]. Available: www.bphn.go.id

Otoritas Jasa Keuangan (OJK), “PERATURAN OTORITAS JASA KEUANGAN NOMOR 12 /POJK.03/2018 TENTANG PENYELENGGARAAN LAYANAN PERBANKAN DIGITAL OLEH BANK UMUM,” 2018. Accessed: Jan. 03, 2025. [Online]. Available: https://www.ojk.go.id/id/regulasi/Documents/Pages/Penyelenggaraan-Layanan-Perbankan-Digital-oleh-Bank-Umum/POJK%2012-2018.pdf?utm_source=chatgpt.com

Otoritas Jasa Keuangan, “OTORITAS JASA KEUANGAN REPUBLIK INDONESIA NOMOR 10 /POJK.05/2022 TENTANG LAYANAN PENDANAAN BERSAMA BERBASIS TEKNOLOGI INFORMASI,” 2022.

R. D. Mendrofa, M. H. Siallagan, J. Amalia, and D. P. Pakpahan, “Credit Risk Analysis With Extreme Gradient Boosting and Adaptive Boosting Algorithm,” Journal of Information System,Graphics, Hospitality and Technology, vol. 5, no. 1, pp. 1–7, Mar. 2023, doi: 10.37823/insight.v5i1.233.

S. Rosi Diaprina and Suhartono, “Analisis Klasifikasi Kredit Menggunakan Regresi Logistik Biner Dan Radial Basis Function Network di Bank ‘X’ Cabang Kediri,” JURNAL SAINS DAN SENI POMITS Vol. 3, No. 2, 2014.

M. I. M. Yusoff, “Machine Learning: An Overview,” Open Journal of Modelling and Simulation, vol. 12, no. 03, pp. 89–99, 2024, doi: 10.4236/ojmsi.2024.123006.

C. L. Perera and S. C. Premaratne, “An Ensemble Machine Learning Approach for Forecasting Credit risk of Loan Applications,” WSEAS Transactions on Systems, vol. 23, pp. 31–46, 2024, doi: 10.37394/23202.2024.23.4.

A. Febriyanti and T. Rizky Izzalqurny, “Predicting Credit Paying Ability With Machine Learning Algorithms,” Majalah Bisnis & IPTEK, vol. 16, no. 1, pp. 8–15, 2023, doi: 10.55208/bistek.

M. Han, “Ensemble Learning Based Models and Deep Learning Model for Credit Prediction, Case Study: Taiwan, China,” in Proceedings of the 1st International Conference on Engineering Management, Information Technology and Intelligence, SCITEPRESS - Science and Technology Publications, 2024, pp. 115–121. doi: 10.5220/0012910900004508.

Y. Li and W. Chen, “A comparative performance assessment of ensemble learning for credit scoring,” Mathematics, vol. 8, no. 10, pp. 1–19, Oct. 2020, doi: 10.3390/math8101756.

M. Zhu, Y. Zhang, Y. Gong, K. Xing, X. Yan, and J. Song, “Ensemble Methodology:Innovations in Credit Default Prediction Using LightGBM, XGBoost, and LocalEnsemble,” Feb. 2024, [Online]. Available: http://arxiv.org/abs/2402.17979

Yosef Jabareen, “Building a Conceptual Framework: Philosophy, Definitions, and Procedure,” Int J Qual Methods, vol. 8, no. 4, pp. 49–62, Dec. 2009, doi: 10.1177/160940690900800406.

M. F. Safitra et al., “Green Networking: Challenges, Opportunities, and Future Trends for Sustainable Development,” in ACM International Conference Proceeding Series, Association for Computing Machinery, Aug. 2023, pp. 168–173. doi: 10.1145/3617733.3617760.

P. Chapman et al., “CRISP-DM 1.0 Step-by-step data mining guide,” DaimlerChrysler, 1999.

M. Óskarsdóttir, C. Bravo, C. Sarraute, J. Vanthienen, and B. Baesens, “The Value of Big Data for Credit Scoring: Enhancing Financial Inclusion using Mobile Phone Data and Social Network Analytics,” Feb. 2020, doi: 10.1016/j.asoc.2018.10.004.

R. Hlongwane, K. K. K. M. Ramaboa, and W. Mongwe, “Enhancing credit scoring accuracy with a comprehensive evaluation of alternative data,” PLoS One, vol. 19, no. 5 May, May 2024, doi: 10.1371/journal.pone.0303566.

M. F. Safitra, M. Lubis, T. F. Kusumasari, and D. P. Putri, “Advancements in Artificial Intelligence and Data Science: Models, Applications, and Challenges,” in Procedia Computer Science, Elsevier B.V., 2024, pp. 381–388. doi: 10.1016/j.procs.2024.03.018.

S. Mestiri, “Credit scoring using machine learning and deep Learning-Based models,” Data Science in Finance and Economics, vol. 4, no. 2, pp. 236–248, 2024, doi: 10.3934/dsfe.2024009.

N. Nguyen and D. Ngo, “Comparative analysis of boosting algorithms for predicting personal default,” Cogent Economics and Finance, vol. 13, no. 1, 2025, doi: 10.1080/23322039.2025.2465971.

S. Consoli, D. R. Recupero, and M. Saisana, Data Science for Economics and Finance: Methodologies and Applications. Springer International Publishing, 2021. doi: 10.1007/978-3-030-66891-4.

Abhishek Kumar, Abhijeet Kumar, Aditya Kumar Singh, and Ms. Nikita, “Credit Scoring System Using Machine Learning,” International Journal of Scientific Research in Computer Science, Engineering and Information Technology, vol. 10, no. 3, pp. 376–380, May 2024, doi: 10.32628/cseit2410334.

N. A. Prahastiwi, R. Andreswari, and R. Fauzi, “STUDENTS GRADUATION PREDICTION BASED ON ACADEMIC DATA RECORD USING THE DECISION TREE ALGORITHM C4.5 METHOD,” JURTEKSI (Jurnal Teknologi dan Sistem Informasi), vol. 8, no. 3, pp. 295–304, Aug. 2022, doi: 10.33330/jurteksi.v8i3.1680.

H. Wang, “Application of Decision Tree Model in Personal Credit Scoring and Its Fairness Optimization,” 2025, doi: 10.54254/2754-1169/176/2025.22114.

J. A. Bastos, “Predicting Credit Scores with Boosted Decision Trees,” Forecasting, vol. 4, no. 4, pp. 925–935, Dec. 2022, doi: 10.3390/forecast4040050.

V. Chang, S. Sivakulasingam, H. Wang, S. T. Wong, M. A. Ganatra, and J. Luo, “Credit Risk Prediction Using Machine Learning and Deep Learning: A Study on Credit Card Customers,” Risks, vol. 12, no. 11, Nov. 2024, doi: 10.3390/risks12110174.

Y. Zhou, L. Shen, and L. Ballester, “A two-stage credit scoring model based on random forest: Evidence from Chinese small firms,” International Review of Financial Analysis, vol. 89, Oct. 2023, doi: 10.1016/j.irfa.2023.102755.

A. Fauziah, “Optimizing Credit Scoring Performance Using Ensemble Feature Selection with Random Forest,” Jurnal Matematika, Statistika dan Komputasi, vol. 21, no. 2, pp. 560–572, Jan. 2025, doi: 10.20956/j.v21i2.42032.

P. Beja-Battais, “Overview of AdaBoost : Reconciling its views to better understand its dynamics,” Oct. 2023, [Online]. Available: http://arxiv.org/abs/2310.18323

J. Lin, “Research on loan default prediction based on logistic regression, randomforest, xgboost and adaboost,” SHS Web of Conferences, vol. 181, p. 02008, 2024, doi: 10.1051/shsconf/202418102008.

Y. Zhao, “A Credit Card Default Prediction Method Based on CatBoost,” 2023, pp. 178–184. doi: 10.2991/978-94-6463-222-4_17.

S. B. Coşkun and M. Turanli, “Credit risk analysis using boosting methods,” Journal of Applied Mathematics, Statistics and Informatics, vol. 19, no. 1, pp. 5–18, May 2023, doi: 10.2478/jamsi-2023-0001.

S. Yanjie, G. Zhike, S. Quan, and C. Lin, “Efficient Commercial Bank Customer Credit Risk Assessment Based on LightGBM and Feature Engineering,” 2023.

D. Williams, E. Brown, J. Smith, M. Johnson, A. Deshmukh, and S. Rodriguez, “Comparative Analysis of LightGBM with Traditional Credit Assessment Methods,” 2024, doi: 10.13140/RG.2.2.29039.65444.

Z. Zhao, T. Cui, S. Ding, J. Li, and A. G. Bellotti, “Resampling Techniques Study on Class Imbalance Problem in Credit Risk Prediction,” Mathematics, vol. 12, no. 5, Mar. 2024, doi: 10.3390/math12050701.

X. Liu, Z. Zhang, and D. Wang, “Classification of Imbalanced Credit scoring data sets Based on Ensemble Method with the Weighted-Hybrid-Sampling.”

M. Fernández-Delgado, E. Cernadas, S. Barro, D. Amorim, and A. Fernández-Delgado, “Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?,” 2014. [Online]. Available: http://www.mathworks.es/products/neural-network.

DOI: http://dx.doi.org/10.24014/ijaidm.v8i2.37677

Refbacks

There are currently no refbacks.

Office and Secretariat:

Big Data Research Centre
Puzzle Research Data Technology (Predatech)
Laboratory Building 1st Floor of Faculty of Science and Technology
UIN Sultan Syarif Kasim Riau

Jl. HR. Soebrantas KM. 18.5 No. 155 Pekanbaru Riau – 28293
Website: http://predatech.uin-suska.ac.id/ijaidm
Email: ijaidm@uin-suska.ac.id
e-Journal: http://ejournal.uin-suska.ac.id/index.php/ijaidm
Phone: 085275359942

Journal Indexing:

IJAIDM Stats