Data Train Reduction on Data Image With K Support Vector Nearest Neighbor (Case Study : Maize Leaf Image)

Marlinda Vasty Overbeek, Yampi R Kaesmetan


In this study, we applied the K Support Vector Nearest Neighbor algorithm to reduce data train on data image. The data image that we used is the maize leaves image infected with fungi and healthy maize leave. The aim of data train reduction in this study is to get faster and more accurate prediction results. This because by using the K Support Vector Nearest Neighbor algorithm, a support vector that is formed from the algorithm really characterize the objective function of the problem. The accuracy obtained from this study is 0.20 or 20% mean error for the value of nearest neighbor K  = 3 and using K Nearest Neighbor as a model construction algorithm. The error value is smaller than when we compared to the construction of the model without performing data train reduction. The error value if not doing any reduction is 0.209 or 20.9%. Whereas in terms of time efficiency, working with the K Support Vector Nearest algorithm is 24 seconds faster than without performing data train reduction



data train reduction; data image; K Support Vector Nearest Neighbor; Maize leaf image


Bengio Y, Courville A, Vincent P. 2013. Representation Learning : A review and new perspectives. IEEE Transaction on Pattern Analysis and Machine Intelligence. 35(8). Pp. 1798-1828

Wakman W, Burhanuddin. 2016. Pengelolaan penyakit prapanen jagung[online] Tersedia pada : [diakses, Juli 11, 2020]

Overbeek MV, Kaesmetan YR, Tobing FAT. 2019. Identification of maize leaves diseases cause by fungus with digital image processing (case study : Bismarak village, Kupang District – East Nusa Tenggara). 2019 5th International Conference on New Media Studies (CONMEDIA), Bali – Indonesia.pp.125-128. DOI : 10.1109/CONMEDIA46929.2019.8981843.

Azlah M Z, Lee S C, Rahmad F R, Abdullah F I, Alwi S R W A. 2019. Review on techiques for plant leaf classification and recognition. Computers vol 8, issue 4.pp.77

Sabu A, Sreekumar K. 2017. Literature review of image features and classifiers used in leaf based plant recognition through image analysis approach. Proc.of the 2017 International Conference on Inventive Communication and Computational Technologies (ICICCT), Coimbatore-India 10 – 11 March 2017.pp.145-149

Kaur S, Pandey S, Goel S. 2019. Plant disesase identification and classification through leaf images : a survey. Arch Computat Methods Eng 26. Pp.207-530. DOI :

Lei Y, Liu H. 2003. Feature selection for high dimensional data : a fast correlation based filter solution. ICML vol 3.pp.856-863

Zheng J, Yang W, Li X. 2017. Training data reduction in deep neural networks with partial mutual information based feature selection and correlation matching based active learning. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA. Pp.2362-2366. DOI : 10.1109/ICASSP.2017.7952579

Doyle S, Monaco J, Feldman M Tomaszewski, Madubhusi A. 2011. An active learning based classification strategy for the minority class problem : application to histopathology annotation. BMC bioinformatics, vol.12, no.1. pp.424

Zhang Y, Zhao Z. 2017. Fetal state assessment based on cardiotocogralhy parameters using PCA and ADA boost. 10th international congress on image and signal processing. BioMedical Engineering and Informatics (CISP-BMEI), pp.1-6.IEEE

Zhu C, Uwa C, Idemudia, Feng W. 2019. Improved logistic regression model for diabetes prediction by integrating PCA and KMeans techniques. Informatics in Medicine Unlocked, page 100-179.

Kaya I E, Pehliyanl A C, Sekizkarde E G, Ibrikci T. 2017. PCA based clustering for brain tumor segmentation of T1W MRI images. Computer Methods and Programs in BioMedicine. 14 : 19-28.

Hu L,Cui J. 2019. Digital image recognition based on fractional – order PCA SVM coupling algorithm. Measurement. 145:150-159.

Bhattacharya S, Kaluri R, Singh S, Alazab M, Tariq U. 2020. A novel PCA-Firefly based XG Boost classification model for intrusion detection in networks using GPU. Electronics, 9(2):219

Gadekallu T R, Khare N, Bhattacharya S, Singh S, Maddikunta P K R, Ra I H, Alazab M. 2020. Early detection of diabetic retinopathy using PCA-Firefly based deep learning model. Electronics. 9(2) : 272

Li Z, Ma X, Xin H. 2017. Feature engineering of machine-learning chemisorption models for catalyst design. Catalyst Today. 280 : 232 – 238.

Cheng C A, Chiu H W. 2017. An artificial neural network model for the evaluation of carotid artery stenting prognosis using a national wide database. 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). Pp 2566-2569.

Zaman S, Toufiq R. 2017. Codon based back propagation neural network approach to classify hypertension gene sequences. 2017 International Conference on Electrical, Computer and Communication Engineering (ECCE). Pp.443-446.

Tang H, Wang T, Li M, Yang X. 2018. The design and implementation of cardiotocography signals classification algorithm based on neural network. Computational and Mathematical Methods in Medicine.

Tao Z, Huiling L, Wenwen W, Xia Y. 2019. GA – SVM based feature selection and parameter optimization in hospitalization expense modelling. Applied Soft Computing. 75:323-332

Karolis M A, Moutiris J A, Hadjipanayi D, Pattichis C S. 2010. Assessment of the risk factors of coronary hearth event based on data mining with decision tree. IEEE Transaction on Information Technology in Biomedicine. 14(3): 559-566

Abdar M, Makarenkov V. 2019. CWV-BANN-SVM ensemble learning classifier for an accurate diagnosis of breast cancer. Measurement. 146:557-570.

Sartakhti J S, Zangooei M H, Mozafari K. 2012. Hepatitis disease diagnosis using a novel hybrid method based on support vector machine and simulated annealing (SVM – SA). Computer methods and programs in BioMedicine. 108(2):570-579

Orphanou K, Dagliati A, Sacchi L, Stassopoulou A, Keravnou E, Bellazi R. 2018. Incorporating repeating temporal association rules in naïve bayes classifiers for coronary heart disease diagnosis. Journal of BioMedical Informatics. 81:74-82.

Qummar S, Khan F G, Shah S, Khan A, Shamshirband S, Rehman Z U, Khan I F, Jadoon W. 2019. A deep learning ensemble approach for diabetic retinopathy detection. IEEE Access. 7:1500530-150539

Srisawat A, Phientrakul T, Kijisrikul B. 2006. SV – KNNC : an algorithm for improving the efficiency of K Nearest Neighbor. Qian Yang, Geoffrey I. Webb, the 09th Pacific RIM International Conference on Artificial Intelligence (PRICAI-2006). Guilin, China, 7-11 August 2006. Springer – Verlag Berlin Heidelberg

Barigou F. 2016. Improving K Nearest Neighbor efficiency for tex categorization. Neural Network World. 26(1). 45

Prasetyo E. 2012. K Support Vector Nearest Neighbor untuk klasifikasi berbasis KNN. Proc. Seminar Nasional Sistem Informasi Informasi. Institut Teknologi Sepuluh November Surabaya

Han J, Kamber M, Pei J. 2012. Data mining Concepts and Techniques 3th edition. Waltham (US) : Morgan Kaufmann Publishers

Cover T, Hart P. 1967. Nearest Neighbor Pattern Classification. IEEE Transaction on Information Theory. 13.pp.21-27

Zhang S, Zhang C. 2013. Orthogonaly locally discriminant projection for classification of plant leaf disease. IEEE International Conference on Computational Intelligence and Security CIS. Leshan. Pp.241-245

Prasad S, Peddoju S K, Ghosh D. 2016. Multi – resolution mobile vision system for plant leaf disease diagnosis. Signal Image Video Process. 1092):379-388

Zhang S W, Shang Y J, Wang L.2015. Plant disease recognition based on plant leaf image. J Anim Plant Sci 25 (suppl. 1):42-45

Pujari J D, Yakkundimath R, Byadgi A S. 2015. Image processing based detection on fungal disesase in plants. Proc. Computer Sci. 26:1802-1808

Wibowo A, Hidayatno S A, Isnanto A, Rizal R. 2011. Analisis deteksi tepi untuk mengidentifikasi pola daun. Undergraduate Thesis, Teknik Elektro Universitas Diponegoro

Overbeek M V, Kaesmetan Y R. 2015. Ekstraksi tekstur benih jagung lokal Pulau Timor dengan GLCM. Proc. SEMMAU I Conference

Putra D. 2010. Pengolahan citra digital. Jogyakarta (ID) : ANDI Jogyakarta



  • There are currently no refbacks.

Office and Secretariat:

Big Data Research Centre
Puzzle Research Data Technology (Predatech)
Laboratory Building 1st Floor of Faculty of Science and Technology
UIN Sultan Syarif Kasim Riau

Jl. HR. Soebrantas KM. 18.5 No. 155 Pekanbaru Riau – 28293
Phone./ Hp.: +62 852-7535-9942

Journal Indexing:

Google Scholar | ROAD | PKP Index | BASE | ESJI | Journal Factor | General Impact Factor | Garuda | Moraref | One Search | Cite Factor | Crossref | WorldCat | Neliti  | SINTA