Early Detection of Dengue Hemorrhagic Fever Using Patient Medical Data with Ensemble Learning Methods

Achmad Saleh; Ridha Mukhtar; Rusdah Rusdah

doi:10.24014/ijaidm.v8i3.38088

Early Detection of Dengue Hemorrhagic Fever Using Patient Medical Data with Ensemble Learning Methods

Achmad Saleh, Ridha Mukhtar, Rusdah Rusdah

Abstract

Dengue Hemorrhagic Fever (DHF) remains a major public health concern in Indonesia and worldwide, where delayed diagnosis increases the risk of severe complications and mortality. Conventional laboratory-based diagnostics are time-consuming and often less accessible in resource-limited healthcare settings. This study aims to develop an early detection model for DHF using only initial clinical symptoms and demographic data extracted from electronic medical records at RSUD Brigjend H. Hasan Basry Kandangan. A total of 649 patient records (352 DHF cases and 297 non-dengue) were analyzed using the CRISP-DM framework. Five ensemble learning algorithms Random Forest, Bagging, AdaBoost, and Gradient Boosted Tree were evaluated across 80:20, 70:30, and 60:40 data splits and validated using 5-fold and 10-fold cross-validation. Random Forest consistently delivered the best and most stable performance, achieving up to 90.00 % accuracy and 0.967 AUC in the 80:20 split and mean accuracies of 88.91 % (5-fold) and 88.29 % (10-fold) in cross-validation. Further hyperparameter tuning enhanced model stability and prevented overfitting. The findings confirm that initial clinical symptoms and demographic attributes can reliably identify DHF cases early, enabling faster and more affordable screening prior to laboratory confirmation. This machine learning based decision-support model has the potential to significantly improve early clinical management of dengue fever.

Keywords

CRISP-DM; Dengue Hemorrhagic Fever; Early Detection; Medical Data; Random Forest

Full Text:

PDF

References

WHO. Dengue Guidelines For Diagnosis, Treatment, Prevention And Control. Vol. 41, Psychiatric News. 2019. 29–29 p.

Kemenkes RI. Membuka Lembaran Baru Untuk Hidup Sejahtera. Lap Tah 2022 Demam Berdarah Dengue. 2022;17–9.

Patel KA, Sandhi MI. Dengue Disease Prediction Using Data Mining Techniques. 2020;8(9):1424–33.

Handayani Y, Hakim AR. Classification of Naive Bayes Algorithm on Dengue Hemorrhagic Fever and Typhoid Fever Based on Hematology Results. J Appl Intell Syst. 2023;8(1):94–9.

Rahman T, Rahman MM. Evaluation of Machine Learning Approaches for Prediction of Dengue Fever [Internet]. Vol. 141, Lecture Notes on Data Engineering and Communications Technologies. Springer Nature Singapore; 2023. 165–175 p. Available from: http://dx.doi.org/10.1007/978-981-19-3035-5_13

Shaikh MSG, SureshKumar DB, Narang DG. Development of optimized ensemble classifier for dengue fever prediction and recommendation system. Biomed Signal Process Control [Internet]. 2023;85(March):104809. Available from: https://doi.org/10.1016/j.bspc.2023.104809

Yudha Arya Dala IM, Gede Darma Putra IK, Wira Buana P. Forecasting Cases of Dengue Hemorrhagic Fever Using the Backpropagation, Gaussians and Support-Vector Machine Methods. J RESTI (Rekayasa Sist dan Teknol Informasi). 2021;5(2):335–41.

Ha J, Kambe M, Pe J. Data Mining [Internet]. Data Mining: Concepts and Techniques. Elsevier; 2012. 1–703 p. Available from: https://linkinghub.elsevier.com/retrieve/pii/C20090618195

Biau G, Scornet E. A random forest guided tour. TEST [Internet]. 2016 Jun 19;25(2):197–227. Available from: http://link.springer.com/10.1007/s11749-016-0481-7

Suci Amaliah, Nusrang M, Aswi A. Penerapan Metode Random Forest Untuk Klasifikasi Varian Minuman Kopi di Kedai Kopi Konijiwa Bantaeng. J Stat Its Appl Teach Res. 2022;4(3):121–7.

Supriyadi R, Gata W, Maulidah N, Fauzi A. Penerapan Algoritma Random Forest Untuk Menentukan Kualitas Anggur Merah. E-Bisnis J Ilm Ekon dan Bisnis. 2020;13(2):67–75.

Amalia N, Asmunin. Optimasi Algoritma Random Forest dengan Hyperparameter Tuning Menggunakan GridSearchCV untuk Prediksi Nasabah Churn pada Industri Perbankan. Manaj Inf. 2024;16(1):1–9.

Sagita A, Faqih A, Dwilestari G, Siswoyo B, Pratama D. Penerapan Metode Random Forest Dalam Menganalisis Sentimen Pengguna Aplikasi Capcut Di Google Play Store. JATI (Jurnal Mhs Tek Inform. 2024;7(6):3307–13.

Huang F, Xie G, Xiao R. Research on Ensemble Learning. In: 2009 International Conference on Artificial Intelligence and Computational Intelligence. 2009. p. 249–52.

Ngo G, Beard R, Chandra R. Evolutionary bagging for ensemble learning. Neurocomputing [Internet]. 2022;510:1–14. Available from: https://www.sciencedirect.com/science/article/pii/S0925231222010414

Pham VT, Le Thi HA, Luu HPH, Damel P. DCA-Based Weighted Bagging: A New Ensemble Learning Approach BT - Intelligent Information and Database Systems. In: Nguyen NT, Boonsang S, Fujita H, Hnatkowska B, Hong TP, Pasupa K, et al., editors. Singapore: Springer Nature Singapore; 2023. p. 121–32.

Biau G, Cadre B. Optimization by Gradient Boosting. Adv Contemp Stat Econom Festschrift Honor Christine Thomas-Agnan. 2021;23–44.

Friedman JH. Stochastic gradient boosting. Comput Stat Data Anal [Internet]. 2002;38(4):367–78. Available from: https://www.sciencedirect.com/science/article/pii/S0167947301000652

Cortes C, Mohri M, Storcheus D. Regularized gradient boosting. Adv Neural Inf Process Syst. 2019;32(NeurIPS).

Cha Y. AdaBoost Algorithm with Classification Belief. J Appl Sci [Internet]. 2015; Available from: https://api.semanticscholar.org/CorpusID:123554062

Sun B, Chen H, Wang J. An empirical margin explanation for the effectiveness of DECORATE ensemble learning algorithm. Knowledge-Based Syst [Internet]. 2015;78:1–12. Available from: https://www.sciencedirect.com/science/article/pii/S095070511500012X

Li D, Liu Z, Armaghani DJ, Xiao P, Zhou J. Novel Ensemble Intelligence Methodologies for Rockburst Assessment in Complex and Variable Environments. Sci Rep. 2022;

Li K, Zhou G, Zhai J, Li F, Shao M. Improved PSO_AdaBoost ensemble algorithm for imbalanced data. Sensors (Switzerland). 2019;19(6).

Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics [Internet]. 2020 Dec 2;21(1):6. Available from: https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-019-6413-7

Sokolova M, Lapalme G. A systematic analysis of performance measures for classification tasks. Inf Process Manag [Internet]. 2009;45(4):427–37. Available from: https://www.sciencedirect.com/science/article/pii/S0306457309000259

DOI: http://dx.doi.org/10.24014/ijaidm.v8i3.38088

Refbacks

There are currently no refbacks.

Office and Secretariat:

Big Data Research Centre
Puzzle Research Data Technology (Predatech)
Laboratory Building 1st Floor of Faculty of Science and Technology
UIN Sultan Syarif Kasim Riau

Jl. HR. Soebrantas KM. 18.5 No. 155 Pekanbaru Riau – 28293
Website: http://predatech.uin-suska.ac.id/ijaidm
Email: ijaidm@uin-suska.ac.id
e-Journal: http://ejournal.uin-suska.ac.id/index.php/ijaidm
Phone: 085275359942

Journal Indexing:

IJAIDM Stats