C4.5, K-Nearest Neighbor, Naïve Bayes, and Random Forest Algorithms Comparison to Predict Students’ On Time Graduation

Gunawan Gunawan, Hanes Hanes, Catherine Catherine

Abstract


Study program performance can be seen from the achievement of accreditation status, where one of the assessment instruments related to the graduate profile is the length of study. Graduation on time is one indicator of student’s success in obtaining a bachelor's degree and is an important attribute, because by being able to predict the period of study, universities can minimize student graduation failures by making more intensive planning, study escort, and guidance. Data mining classification techniques can be used to predict students graduation on time. Many data mining classification algorithms can be used, so it is necessary to make comparisons to determine the level of accuracy of each algorithm. The algorithms that will be compared in this study are C4.5, K-Nearest Neighbor, Naive Bayes, and Random Forest. The data used were 2,022 graduates from Informatics Engineering and Information System Study Program of STMIK Mikroskil Medan from 2011 to 2014, in which the attributes used include gender, regional origin, time of study, grade of Entrance Screening Examination, and Grade Point Average (GPA). The results of the classification process are evaluated using cross validation and confusion matrix to determine the most accurate data mining classification algorithm for predicting student graduation on time, where the K-Nearest Neighbor and Random Forest algorithms have the highest accuracy of 72,651%, followed by the C4.5 algorithm 72,453%, and the Naïve Bayes algorithm 71,860%.

Keywords


Data Mining Classification; Timely Graduation; C4.5; K-Nearest Neighbor; Naïve Bayes; Random Forest

Full Text:

PDF

References


Bhardwaj, B., & Pal, S. (2011, April). Data Mining: A Prediction for Performance Improvement Using Classification. International Journal of Computer Science and Information Security (IJCSIS), 9(4), 1-5.

Manek, A. S., Shenoy, P., Mmohan, M., & R, V. K. (2017). Aspect Term Extraction for Sentiment Analysis in Large Movie Review Using Gini Index Feature Selection Method and SVM Classifier. World Wide Web, 20(2), 135-154. doi:10.1007/s11280-015-0381-x

Nikam, S. (2015). A Comparative Study of Classification Techniques in Data Mining Algorithms. Oriental Journal of Computer Science & Technology, 8(1), 13-19.

Qabajeh, I., Thabtah, F., & Chiclana, F. (2015). A Dynamic Rule-Induction Method for Classification in Data Mining. Journal of Management Analytics, 2(3), 233-253. doi:10.1080/23270012.2015.1090889

Seidlova, R., Pozivil, J., & Seidl, J. (2019). Marketing and Business Iintelligence with Help of Ant Colony Algorithm. Journal of Strategic Marketing, 27(5), 451-463. doi:10.1080/0965254X.2018.1430058

Ahmad, F., Ismail, N., & Aziz, A. (2015). The Prediction of Students’ Academic Performance Using Classification Data Mining Techniques. Apllied Mathematical Sciences, 9(129), 6415-6426. Retrieved from 10.12988/ams.2015.53289

Mueen, A., Zafar, B., & Manzoor, U. (2016). Modeling and Predicting Student’s Academic Performance Using Data Mining Techniques. International Journal of Modern Education and Computer Science, 8(11), 36-42. doi:10.5815/ijmecs.2016.11.05

Cheewaprakobkit, P. (2013). Study of Factors Analysis Affecting Academic Achievement of Undergraduate Students in International Program. Proceedings of the International MultiConference of Engineers and Computer Scientests (IMECS), (pp. 13-15). Hongkong.

Saa, A. A. (2016). Educational Data Mining & Students’ Performance Prediction. International Journal of Advanced Computer Science and Applications, 7(5), 212-220.

Polaka, I., & Borisov, A. (2010). Clustering-Based Decision Tree Classifier Construction. Technological and Economic Development of Economy, 16(4), 765-781. doi:10.3846/tede.2010.47

Dindarloo, S. R., & Siami-Irdemoosa, E. (2016). Data Mining in Mining Engineering: Results of Classification and Clustering of Shovels Failures Data. Journal of Mining, Reclamation and Environment, 31(2), 105-118. doi:10.1080/17480930.2015.1123599

Abad, F. M., & Lopez, A. A. (2016). Data-Mining Techniques in Detecting Factors Linked to Academic Achievement. School Effectiveness and School Improvement, 28(1), 39-55. doi:10.1080/09243453.2016.1235591

Hussain, S., Dahan, N., Ba-Alwib, F., & Ribata, N. (2018). Educational Data Mining and Analysis of Students’ Academic Performance Using WEKA. Indonesian Journal of Electrical Engineering and Computer Science, 9(2), 447-459. doi:10.11591/ijeecs.v9.i2

Mishra, T., Kumar, D., & Gupta, S. (2014). Mining Students’ Data for Performance Prediction. 2014 Fourth International Conference on Advanced Computing & Communication Technologies, (pp. 255-263). doi:10.1109/ACCT.2014.105

Hashemi, F. S., Ismail, M. R., Yusop, M. R., Hashemi, M. S., Shahraki, M. H., Rastegari, H., . . . Aslani, F. (2017). Intelligent Mining of Large-Scale Bio-Data: Bioinformatics Applications. Bioinformatics Applications, Biotechnology & Biotechnological Equipment, 32(1), 10-29. doi:10.1080/13102818.2017.1364977

Gallagher, C., Bruton, K., & O'Sullivan, D. T. (2016). Utilising the Cross Industry Standard Process for Data Mining to Reduce Uncertainty in the Measurement and Verification of Energy Savings. Springer (pp. 48-58). Cham: Springer International Publishing Switzerland. doi:10.1007/978-3-319-40973-3_5

Pivk, A., Vasilecas, O., Kalibatiene, D., & Rupnik, R. (2013). On Approach for the Implementation of Data Mining to Business Process Optimisation in Commercial Companies. Technological and Economic Development of Economy, 19(2), 237-256. doi:10.3846/20294913.2013.796501

Cho, M.-H., & Yoo, J. (2017). Exploring Online Students’ Self-regulated Learning With Self-reported Surveys and Log Files: A Data Mining Approach. Interactive Learning Environments, 25(8), 970-982. doi:10.1080/10494820.2016.1232278

Osmanbegovic, E., & Suljic, M. (2012). Data Mining Approach for Predicting Student Performance. Economic Review: Journal of Economics and Business, 10(1), 3-12. Retrieved from http://hdl.handle.net/10419/193806

Rao, K., Swapna, N., & Kumar, P. (2018). Educational Data Mining for Student Placement Prediction Using

Machine Learning Algorithm. International Journal of Engineering & Technology, 7(1.2), 43-46.

Shingari, I., Kumar, D., & Khetan, M. (2017). A Review of Application of Data Mining Ttechniques for Prediction of Students' Performance in Higher Education. Journal of Statistics and Management Systems, 20(4), 713-722. doi:10.1080/09720510.2017.1395191

Sathyadevan, S., & Nair, R. R. (2015). Comparative Analysis of Decision Tree Algorithm: ID3, C4.5 and Random Forest. Computational Intelligence in Data Mining, 1, 549-562. doi:10.1007/978-81-322-2205-7_51

Li, Y., Jiang, Z. L., Yao, L., Wang, X., Yiu, S., & Huang, Z. (2017). Outsourced Privacy-Preserving C4.5 Decision Tree Algorithm Over Horizontally and Vertically Partitioned Dataset Among Multiple Parties. Cluster Computing, 1-13. doi:10.1007/s10586-017-1019-9

Nagra, A., Han, F., Ling , Q., Abubaker, M., Ahmad, F., Mehta, S., & Apasiba, A. (2019). Hybrid Self-Inertia Weight Adaptive Particle Swarm Optimisation with Local Search Using C4.5 Decision Tree Classifier for Feature Selection Problems. Connection Science, 1-21. doi:10.1080/09540091.2019.1609419

Cherfi, A., Nouira, K., & Ferchichi, A. (2018). Very Fast C4.5 Decision Tree Algorithm. Applied Artificial Intelligence, 32(2), 119-137. doi:10.1080/08839514.2018.1447479

Anuradha, C., & Velmurugan, T. (2015, July). A Comparative Analysis on the Evaluation of Classification Algorithms in the Prediction of Students Performance. Indian Journal of Science and Technology, 8(15), 1-12. doi:10.17485/ijst/2015/v8i15/74555

Tahi, M., Miloudi, A., Dron, J., & Bouzouane, B. (2018). Decision Tree and Feature Selection by Using Genetic Wrapper for Fault Diagnosis of rotating machinery. Australian Journal of Mechanical Engineering, 1-9. doi:10.1080/14484846.2018.1552355

Koutina, M., & Kermanidis, K. (2011). Predicting Postgraduate Students’ Performance Using Machine Learning Techniques. IFIP International Federation for Information Processing, 159-168.

Ratnaningsih, D. J., & Sitanggang, I. S. (2015). Comparative Analysis of Classification Methods in Determining Non-Active Student Characteristics in Indonesia Open University. Journal of Applied Statistics, 43(1), 87-97. doi:10.1080/02664763.2015.1077940

Alam, F., Mehmood, R., Katib, I., & Albeshri, A. (2016). Analysis of Eight Data Mining Algorithms for Smarter Internet of Things (IoT). Procedia Computer Science, 98, 437-442. doi:10.1016/j.procs.2016.09.068

Adeniyi, D., Wei, Z., & Yongquan, Y. (2016). Automated Web Usage Data Mining and Recommendation System Using K-Nearest Neighbor (KNN) Classification Method. Applied Computing and Informatics, 12(1), 90-108. doi:10.1016/j.aci.2014.10.001

Kanj, S., Abdallah, F., Denoeux, T., & Tout, K. (2016). Editing Training Data for Multi-label Classification with the K-nearest Neighbor Rule. Pattern Analysis and Applications, 19(1), 145-161. doi:10.1007/s10044-015-0452-8

Shahiri, A., Husain, W., & Rashid, N. (2015). A Review on Predicting Student’s Performance Using Data Mining Techniques. Procedia Computer Science, 72, 414-422. doi:10.1016/j.procs.2015.12.157

Tarapitakwong, J., Chartrungruang, B., Tantranont, N., & Smhom, S. (2017). A Classification Model for Predicting Standard Levels of OTOP’s Wood Handicraft Products by Using the K-Nearest Neighbor. Tthe International Journal of the Computer, the Internet and Management, 25(2), 135-141.

Erkayaoglu, M., & Dessureault, S. (2018). Improving Mine-to-Mill by Data Warehousing and Data Mining. International Journal of Mining, Reclamation and Environment, 1-16. doi:10.1080/17480930.2018.1496885

Ashraf, N., Ahmad, W., & Ashraf, R. (2018). A Comparative Study of Data Mining Algorithms for High Detection Rate in Intrusion Detection System. Annals of Emerging Technologies in Computing (AETiC), 2(1), 49-57.

Oktanisa, I., & Supianto, A. (2018). Perbandingan Teknik Klasifikasi Dalam Data Mining untuk Bank Direct Marketing. Jurnal Teknologi Informasi dan Ilmu Komputer (JTIK), 5(5), 567-576.

Nidhomuddin, & Otok, B. (2015). Random Forest dan Multivariate Adaptive Regression Spline (MARS) Binary Response untuk Klasifikasi Penderiat HIV/ AIDS. Statistika, 1(3), 567-576.

Nugroho, Y., & Emiliyawati, N. (2017). Sistem Klasifikasi Variabel Tingkat Penerimaan Konsumen terhadap Mobil Menggunakan Metode Random Forest. Jurnal Teknik Elektro, 9(1), 24-29.

Gunawan, Hanes, & Catherine. (2020). Information Systems Students' Study Performance Prediction Using Data Mining Approach. IEEE Xplore. doi: 10.1109/ICIC47613.2019.8985718

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA Data Mining Software: An Update. ACM SIGKDD Explorations Newsletter, 11(1), 10-18.




DOI: http://dx.doi.org/10.24014/ijaidm.v4i2.10833

Refbacks

  • There are currently no refbacks.


Office and Secretariat:

Big Data Research Centre
Puzzle Research Data Technology (Predatech)
Laboratory Building 1st Floor of Faculty of Science and Technology
UIN Sultan Syarif Kasim Riau

Jl. HR. Soebrantas KM. 18.5 No. 155 Pekanbaru Riau – 28293
Website: http://predatech.uin-suska.ac.id/ijaidm
Email: ijaidm@uin-suska.ac.id
e-Journal: http://ejournal.uin-suska.ac.id/index.php/ijaidm
Phone: 085275359942

Click Here for Information


Journal Indexing:

Google Scholar | ROAD | PKP Index | BASE | ESJI | General Impact Factor | Garuda | Moraref | One Search | Cite Factor | Crossref | WorldCat | Neliti  | SINTA | Dimensions | ICI Index Copernicus 

IJAIDM Stats