ADDITIONAL MENU
Enhancing Student Performance Classification Through Dimensionality Reduction and Feature Selection in Machine Learning
Abstract
Education plays an important role in shaping the intellectual and character of the nation's next generation. However, poor student academic performance is a major challenge, especially regarding student retention and dropout risk. This study aims to evaluate the performance of machine learning algorithms, namely K-Nearest Neighbor (K-NN), Light Gradient Boosting Machine (LightGBM), and Extreme Gradient Boosting (XGB), and analyze the effect of dimensionality reduction using Principal Component Analysis (PCA) and feature selection with Recursive Feature Elimination (RFE) on student performance prediction accuracy. The research dataset consists of 395 student samples with demographic, social, and academic attributes. The results show that XGB has the best performance with 98.32% accuracy and can predict all classes with perfect 100% accuracy. LightGBM and K-NN achieved 94.87% and 93.88% accuracy, respectively. The best attributes affecting student performance were found in the “Highly Prioritized” category, including study time, family support, family, and health. Although PCA slightly degraded the model performance, feature selection with RFE significantly improved accuracy. This study concludes that proper algorithm selection and focus on relevant attributes can improve prediction accuracy and efficiency, making an important contribution to the development of more effective education prediction systems.
References
F. N. A. Kurniawati, “Meninjau Permasalahan Rendahnya Kualitas Pendidikan Di Indonesia Dan Solusi,” Acad. Educ. J., vol. 13, no. 1, pp. 1–13, 2022, doi: 10.47200/aoej.v13i1.765.
Imamah, U. L. Yuhana, A. Djunaidy, and M. H. Purnomo, “Enhancing students performance through dynamic personalized learning path using ant colony and item response theory (ACOIRT),” Comput. Educ. Artif. Intell., vol. 7, no. April, p. 100280, 2024, doi: 10.1016/j.caeai.2024.100280.
S. M. Saadullah, S. Ammar, and A. Alazzani, “Exploring verbal, interpersonal, and visual intelligences in accounting education: Effects on student learning and performance,” J. Account. Educ., vol. 68, no. August 2023, p. 100917, 2024, doi: 10.1016/j.jaccedu.2024.100917.
I. Costa, M. Angˆ, and M. Angˆ, “ScienceDirect Student Student Performance Performance Prediction Prediction on on Primary Primary and and Secondary Secondary Schools Schools Systematic Literature Review - A Systematic Literature Review,” vol. 00, 2022, doi: 10.1016/j.procs.2022.11.229.
F. A. Al-azazi and M. Ghurab, “ANN-LSTM: A deep learning model for early student performance prediction in MOOC,” Heliyon, vol. 9, no. 4, p. e15382, 2023, doi: 10.1016/j.heliyon.2023.e15382.
T. Gori, “Preprocessing Data dan Klasifikasi untuk Prediksi Kinerja Akademik Siswa,” J. Teknol. Inf. dan Ilmu Komput., vol. 11, no. 1, pp. 215–224, 2024, doi: 10.25126/jtiik.20241118074.
T. Tadhkiroh, B. Akbar, and T. I. Hartini, “Pengembangan Instrumen Penilaian Kinerja pada Muatan IPA Kurikulum 2013 Tingkat Sekolah Dasar,” J. Basicedu, vol. 7, no. 1, pp. 631–644, 2023, doi: 10.31004/basicedu.v7i1.4720.
R. Nuraini, F. Fadllurrohman, and N. Norfaizah, “Implementasi Penilaian Hasil Belajar Siswa Berbasis Rapor Digital Madrasah Di MI Mathla’ul Anwar HSU,” Al-Madrasah J. Pendidik. Madrasah Ibtidaiyah, vol. 6, no. 4, p. 1053, 2022, doi: 10.35931/am.v6i4.1174.
W. J. Sari et al., “Performance Comparison of Random Forest, Support Vector Machine and Neural Network in Health Classification of Stroke Patients,” Public Res. J. Eng. Data Technol. Comput. Sci., vol. 2, no. 1, pp. 34–43, 2024, doi: 10.57152/predatecs.v2i1.1119.
M. Chen and Z. Liu, “Heliyon Predicting performance of students by optimizing tree components of random forest using genetic algorithm,” Heliyon, vol. 10, no. 12, p. e32570, 2024, doi: 10.1016/j.heliyon.2024.e32570.
S. A. Priyambada, T. Usagawa, and M. ER, “Two-layer ensemble prediction of students’ performance using learning behavior and domain knowledge,” Comput. Educ. Artif. Intell., vol. 5, no. June, p. 100149, 2023, doi: 10.1016/j.caeai.2023.100149.
C. I. Hatleberg et al., “Predictors of Ischemic and Hemorrhagic Strokes Among People Living With HIV: The D:A:D International Prospective Multicohort Study,” EClinicalMedicine, vol. 13, pp. 91–100, 2019, doi: 10.1016/j.eclinm.2019.07.008.
N. B. Shaik, K. Jongkittinarukorn, and K. Bingi, “Jo ur na l P of,” Case Stud. Chem. Environ. Eng., p. 100775, 2024, doi: 10.1016/j.cscee.2024.100775.
M. Shehab, R. Taherdangkoo, and C. Butscher, “Computers and Geotechnics Towards Reliable Barrier Systems : A Constrained XGBoost Model Coupled with Gray Wolf Optimization for Maximum Swelling Pressure of Bentonite,” Comput. Geotech., vol. 168, no. February, p. 106132, 2024, doi: 10.1016/j.compgeo.2024.106132.
X. Mao et al., “A variable weight combination prediction model for climate in a greenhouse based on BiGRU-Attention and LightGBM,” Comput. Electron. Agric., vol. 219, no. July 2023, p. 108818, 2024, doi: 10.1016/j.compag.2024.108818.
D. Zhang, “The Comparison of LightGBM and XGBoost Coupling Factor Analysis and Prediagnosis of Acute Liver Failure,” vol. 8, 2020, doi: 10.1109/ACCESS.2020.3042848.
A. Yasar, “INTELLIGENT SYSTEMS AND APPLICATIONS IN ENGINEERING Data Classification of Early-Stage Diabetes Risk Prediction Datasets and Analysis of Algorithm Performance Using Feature Extraction Methods and Machine Learning Techniques,” vol. 9, no. 4, pp. 273–281, 2021, doi: 10.1039/b000000x.
A. I. Putri et al., “Implementation of K-Nearest Neighbors, Naïve Bayes Classifier, Support Vector Machine and Decision Tree Algorithms for Obesity Risk Prediction,” Public Res. J. Eng. Data Technol. Comput. Sci., vol. 2, no. 1, pp. 26–33, 2024, doi: 10.57152/predatecs.v2i1.1110.
A. Khan and A. Saboor, “Heart Disease Identification Method Using Machine Learning Classification in E-Healthcare,” vol. 8, no. Ml, 2020, doi: 10.1109/ACCESS.2020.3001149.
M. Qaraad, A. K. Kelany, and X. Chen, “An Efficient SVM-Based Feature Selection Model for Cancer Classification Using High-Dimensional Microarray Data,” IEEE Access, vol. 9, pp. 155353–155369, 2021, doi: 10.1109/ACCESS.2021.3123090.
D. P. Utomo, “Analisis Komparasi Metode Klasifikasi Data Mining dan Reduksi Atribut Pada Data Set Penyakit Jantung,” vol. 4, no. April, pp. 437–444, 2020, doi: 10.30865/mib.v4i2.2080.
F. Ardiansyah and I. W. Siadi, “Klasifikasi Customer Relationship Management Menggunakan Dataset KDD Cup 2009 dengan Teknik Reduksi Dimensi Classification of Customer Relationship Management using KDD Cup 2009 Dataset with Dimension Reduction Technique,” vol. 11, no. 28, 2022, doi: 10.34010/komputika.v11i2.6498.
M. Kamal and M. Shorif, “Computers and Education : Artificial Intelligence Attention monitoring of students during online classes using XGBoost classifier,” Comput. Educ. Artif. Intell., vol. 5, no. November, p. 100191, 2023, doi: 10.1016/j.caeai.2023.100191.
W. Alia et al., “Factors Affecting Student ’ s Academic Performance,” vol. 7, no. 1, pp. 99–107, 2021.
S. T. Ahmed, R. Al-hamdani, and M. S. Croock, “Enhancement of student performance prediction using modified K-nearest neighbor,” vol. 18, no. 4, 2020, doi: 10.12928/TELKOMNIKA.v18i4.13849.
S. Hussain, H. F. Öztop, A. Madhi, and F. Ertam, “Mixed bioconvection of nanofluid of oxytactic bacteria through a porous cavity with inlet and outlet under periodic magnetic field using artificial intelligence based on LightGBM algorithm,” Therm. Sci. Eng. Prog., vol. 50, no. April, p. 102589, 2024, doi: 10.1016/j.tsep.2024.102589.
S. Jafari, J. H. Yang, and Y. C. Byun, “Optimized XGBoost modeling for accurate battery capacity degradation prediction,” Results Eng., vol. 24, no. July, p. 102786, 2024, doi: 10.1016/j.rineng.2024.102786.
A. F. Lubis et al., “Classification of Diabetes Mellitus Sufferers Eating Patterns Using K-Nearest Neighbors, Naïve Bayes and Decission Tree,” Public Res. J. Eng. Data Technol. Comput. Sci., vol. 2, no. 1, pp. 44–51, 2024, doi: 10.57152/predatecs.v2i1.1103.
A. R. Lubis and M. Lubis, “Optimization of distance formula in K-Nearest Neighbor method,” vol. 9, no. 1, pp. 326–338, 2020, doi: 10.11591/eei.v9i1.1464.
M. Bansal, A. Goyal, and A. Choudhary, “A comparative analysis of K-Nearest Neighbor , Genetic , Support Vector Machine , Decision Tree , and Long Short Term Memory algorithms in machine learning,” Decis. Anal. J., vol. 3, no. May, p. 100071, 2022, doi: 10.1016/j.dajour.2022.100071.
D. Sarkasme, P. Dataset, N. Headline, D. Metode, and E. Deep, “Sarcasm Detection in News Headline Dataset with Ensemble Deep Learning Method,” vol. 6, no. 2, pp. 47–52, 2023, doi: https://doi.org/10.21070/joincs.v6i2.1628.
W. Liu, W. D. Liu, and J. Gu, “Predictive model for water absorption in sublayers using a Joint Distribution Adaption based XGBoost transfer learning method,” J. Pet. Sci. Eng., vol. 188, no. August 2019, p. 106937, 2020, doi: 10.1016/j.petrol.2020.106937.
DOI: http://dx.doi.org/10.24014/ijaidm.v8i3.37783
Refbacks
- There are currently no refbacks.
Office and Secretariat:
Big Data Research Centre
Puzzle Research Data Technology (Predatech)
Laboratory Building 1st Floor of Faculty of Science and Technology
UIN Sultan Syarif Kasim Riau
Jl. HR. Soebrantas KM. 18.5 No. 155 Pekanbaru Riau – 28293
Website: http://predatech.uin-suska.ac.id/ijaidm
Email: ijaidm@uin-suska.ac.id
e-Journal: http://ejournal.uin-suska.ac.id/index.php/ijaidm
Phone: 085275359942
Journal Indexing:
Google Scholar | ROAD | PKP Index | BASE | ESJI | General Impact Factor | Garuda | Moraref | One Search | Cite Factor | Crossref | WorldCat | Neliti | SINTA | Dimensions | ICI Index Copernicus
IJAIDM Stats










