Comparison Of The Performance Of K-Nearest Neighbors And Naive Bayes Algorithms For Stroke Disease Prediction
Abstract
Purpose: Stroke is a critical global health issue requiring early and accurate prediction to mitigate severe outcomes. This study aims to compare the performance of the K-Nearest Neighbors (KNN) and Naive Bayes algorithms in predicting stroke disease, addressing the challenge of imbalanced datasets and improving prediction accuracy for better clinical decision-making.
Methods/Study design/approach: The research followed the CRISP-DM model, utilizing a dataset of 5,110 patient records with 12 attributes from Kaggle. Data preprocessing included handling missing values and normalization. The KNN and Naive Bayes algorithms were implemented using RapidMiner, with performance evaluated through cross-validation, confusion matrices, and ROC-AUC curves.
Result/Findings: The KNN algorithm achieved an accuracy of 94.50%, but exhibited low precision (7.89%) and recall (1.20%) for stroke-positive cases due to dataset imbalance. Naive Bayes yielded an accuracy of 88.83% with an AUC of 0.767, demonstrating better probability modeling but similar challenges in minority class detection. Both algorithms highlighted the impact of data imbalance on predictive performance.
Novelty/Originality/Value: This study provides a comparative analysis of KNN and Naive Bayes for stroke prediction, emphasizing the need for data balancing and optimization techniques. The findings underscore the potential of these algorithms in healthcare applications while suggesting future improvements through ensemble methods or alternative algorithms like Random Forest.
Keywords
Full Text:
PDFReferences
D. E. Cahyani and D. E. Cahyani, “PENERAPAN MACHINE LEARNING UNTUK PREDIKSI PENYAKIT STROKE,” Jurnal Kajian Matematika dan Aplikasinya VOLUME, vol. 3, no. 1, 2022, doi: 10.17977/um055v3i1p15-22.
I. Widharma et al., “PERANCANGAN SISTEM INFORMASI PENYINTAS STROKE BERBASIS WEB DENGAN METODE SDLC,” DA Indah Cahya Dewi, vol. 6, no. 2, p. 41.
P. Bidang Komputer Sains dan Pendidikan Informatika, D. Akademi Perekam dan Informasi Kesehatan Iris Padang Jl Gajah Mada No, and S. Barat, “Jurnal Edik Informatika Data Mining : Klasifikasi Menggunakan Algoritma C4.5 Yuli Mardi”.
Zuriati Z and Diterima, “Klasifikasi Penyakit Stroke Menggunakan Algoritma K-Nearest Neighbor (KNN) INFORMASI ARTIKEL ABSTRAK Classification of Stroke Using the K-Nearest Neighbor (KNN) Algorithm,” vol. 1, no. 1, pp. 1–8, 2023, doi: 10.20222/rt.v1i1.2665.
S. and Communication Networks, “Retracted: Analysis and Application of Data Mining Technology for College English Education Integration,” Security and Communication Networks, vol. 2024, pp. 1–1, Jan. 2024, doi: 10.1155/2024/9836129.
G. Sanhaji and A. I. Hizbullah, “PEMANFAATAN ARTIFICIAL INTELLIGENCE DALAM BIDANG KESEHATAN,” EDUSAINTEK: Jurnal Pendidikan, Sains dan Teknologi, vol. 11, no. 1, pp. 234–242, Aug. 2023, doi: 10.47668/edusaintek.v11i1.999.
P. H. Trenggono and A. Bachtiar, “PERAN ARTIFICIAL INTELLIGENCE DALAM PELAYANAN KESEHATAN : A SYSTEMATIC REVIEW,” 2023, [Online]. Available: http://journal.universitaspahlawan.ac.id/index.php/ners
J. T. Atmojo et al., “ARTIFICIAL INTELLIGENCE DALAM PRAKTIK KESEHATAN,” 2024. [Online]. Available: http://journal.stikeskendal.ac.id/index.php/PSKM
S. Hassani and U. Dackermann, “A Systematic Review of Advanced Sensor Technologies for Non-Destructive Testing and Structural Health Monitoring,” Feb. 01, 2023, MDPI. doi: 10.3390/s23042204.
X. Liu, J. Yan, S. Shan, and R. Wu, “A Blockchain-Assisted Electronic Medical Records by Using Proxy Reencryption and Multisignature,” Security and Communication Networks, vol. 2022, 2022, doi: 10.1155/2022/6737942.
X. Yan and X. Ren, “5G Edge Computing Enabled Directional Data Collection for Medical Community Electronic Health Records,” J Healthc Eng, vol. 2021, 2021, doi: 10.1155/2021/5598077.
C. C. A. Silva, G. S. Aquino, S. R. M. Melo, and D. J. B. Egdio, “A fog computing-based architecture for medical records management,” Wirel Commun Mob Comput, vol. 2019, 2019, doi: 10.1155/2019/1968960.
B. Mahesh, “Machine Learning Algorithms - A Review,” International Journal of Science and Research (IJSR), vol. 9, no. 1, pp. 381–386, Jan. 2020, doi: 10.21275/art20203995.
D. Berezkin, I. Kozlov, and P. Martynyuk, “Predictive analytics of scientific and technological trends for decision making in university management,” in Procedia Computer Science, Elsevier B.V., 2024, pp. 270–277. doi: 10.1016/j.procs.2024.03.001.
B. S. Wiguna, D. Purwitasari, and D. O. Siahaan, “Deep Learning Approach for Health Question and Answer Text Segmentation based on Physician-Patient Communication Aspect,” in Procedia Computer Science, Elsevier B.V., 2024, pp. 213–221. doi: 10.1016/j.procs.2024.02.168.
V. P. Prasetyo, M. F. A. Ulin Nuha, M. H. Hakiki, R. A. Vinarti, and A. Djunaidy, “Comparison of Data Mining Techniques on Stroke Clinical Dataset,” in Procedia Computer Science, Elsevier B.V., 2024, pp. 502–511. doi: 10.1016/j.procs.2024.03.033.
D. Berrar, “Bayes’ Theorem and Naive Bayes Classifier,” in Encyclopedia of Bioinformatics and Computational Biology, S. Ranganathan, M. Gribskov, K. Nakai, and C. Schönbach, Eds., Oxford: Academic Press, 2019, pp. 403–412. doi: https://doi.org/10.1016/B978-0-12-809633-8.20473-1.
T. T. Sang Nguyen, “Model-based book recommender systems using Naïve Bayes enhanced with optimal feature selection,” in ACM International Conference Proceeding Series, Association for Computing Machinery, 2019, pp. 217–222. doi: 10.1145/3316615.3316727.
Parteek Bhatia, “Data Mining and Data Warehousing,” 2019.
C. Karima and W. Anggraeni, “Performance Analysis of the Ada-Boost Algorithm For Classification of Hypertension Risk With Clinical Imbalanced Dataset,” in Procedia Computer Science, Elsevier B.V., 2024, pp. 645–653. doi: 10.1016/j.procs.2024.03.050.
F. Ridzuan and W. M. N. W. Zainon, “A Review on Data Quality Dimensions for Big Data,” in Procedia Computer Science, Elsevier B.V., 2024, pp. 341–348. doi: 10.1016/j.procs.2024.03.008.
DOI: http://dx.doi.org/10.24014/coreit.v11i2.37542
Refbacks
- There are currently no refbacks.
| | |
| Jurnal CoreIT by http://ejournal.uin-suska.ac.id/index.php/coreit/ is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. | ||
