Metode Robust K-Fold Cross Validation dengan Partial Least Square Regression pada Data Near Infrared Spectroscopy

Nuraini Sibuea, Syamsudhuha Syamsudhuha, Arisman Adnan

Abstract


Penelitian ini mengevaluasi performa model Partial Least Square Regression (PLSR) dalam kondisi data dengan dan tanpa outlier. Penanganan data yang mengandung outlier digunakan metode k-fold cross validation yang diaplikasikan pada data Near Infrared Spectroscopy (NIRS) tanah perkebunan kelapa sawit terhadap pupuk nitrogen (N). Sebelum pengolahan data dilakukan terlebih dahulu pretreatment data untuk menghilangkan efek hamburan data dengan Standardized Normal Variate (SNV). Identifikasi outlier dilakukan dengan metode RBF Kernel PCA menghasilkan data yang termasuk outlier yaitu data ke 7, 8, 92, 93, dan 95. Hasil analisis menunjukkan bahwa keberadaan outlier secara signifikan menurunkan performa PLSR klasik dengan penurunan nilai R2 dan peningkatan nilai RMSE. Penerapan k-fold cross validation pada PLSR mampu meningkatkan robustitas model terhadap outlier dengan peningkatan nilai R2 meskipun sedikit peningkatan pada RMSE. Disimpulkan bahwa k-fold cross validation lebih efektif dalam menangani data set yang mengandung outlier sehingga memberikan prediktabilitas yang lebih stabil dibandingkan PLSR klasik.


Full Text:

PDF

References


D. D. Silalahi, H. Midi, J. Arasan, M. S. Mustafa, and J. P. Caliman. Kernel Partial Diagnostic Robust Potential to Handle High-Dimensional and Irregular Data Space on Near Infrared Spectral Data. Heliyon. 2020; 6(1): 03176.

S. Raschka. Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning. University of Wisconsin-Madison. Department of Statistics. Report number:1-13. 2018.

D. D. Silalahi, H. Midi, J. Arasan, M. S. Mustafa, and J. P. Caliman. Robust Generalized Multiplicative Scatter Correction Algorithm on Pretreatment of Near Infrared Spectral Data. Vib. Spectrosc. 2018; 97 (11): 55-56.

B. H. Stuart. Infrared Spectroscopy: Fundamentals and Applications.UK: Wiley. 2004.

J. Supriatna, D. Djumarno, A. B. Saluy, and D. Kurniawan. Sustainability Analysis of Smallholder Oil Palm Plantations in Several Provinces in Indonesia. Sustainability. 2024; 16(11): 4383.

H. Aleiadeh et al. Effect of Co-application of Vetiver Grass Biochar and NPK Fertilizer on the Growth of Oil Palm (Elaeis guineensis Jacq). Malaysian Journal of Soil Science. 2024; 28(1): 26-37.

D. D. Silalahi, H. Midi, J. Arasan, M. S. Mustafa, and J. P. Caliman. Automated Fitting Process Using Robust Reliable Weighted Average on Near Infrared Spectral Data Analysis. Symmetry (Basel). 2020; 12(12): 1-27.

M. J. Kim et al. Prediction of Soluble-Solid Content in Citrus Fruit Using Visible–Near-Infrared Hyperspectral Imaging Based on Effective-Wavelength Selection Algorithm. Sensors. 2024; 24 (5): 1512.

T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning Data Mining, Inference, and Prediction. Second Edition. California: Springer. 2008.

D. A, B. E. W, and Ciurczak. Handbook of Near-Infrared Analysis. Third Edition. Francis: CRC Press. 2008.

R. Tan, J. R. Ottewill, and N. F. Thornhill. Monitoring Statistics and Tuning of Kernel Principal Component Analysis with Radial Basis Function Kernels. IEEE Access. 2020; 8(1):198328–198342.


Refbacks

  • There are currently no refbacks.


FAKULTAS SAINS DAN TEKNOLOGI
UIN SUSKA RIAU

Kampus Raja Ali Haji
Gedung Fakultas Sains & Teknologi UIN Suska Riau
Jl.H.R.Soebrantas No.155 KM 18 Simpang Baru Panam, Pekanbaru 28293
Email: sntiki@uin-suska.ac.id