Analysis of SQL Injection and Cross-Site Scripting (XSS) Attacks on Web Server Logs Using Machine Learning

Adi Septian, Atep Aulia Rahman

Abstract


The increasing complexity of cyber threats requires accurate detection systems to identify attack patterns on web servers. This study aims to detect SQL Injection and Cross-Site Scripting (XSS) attacks in Nginx access logs using machine learning algorithms. Log data were processed through regular expressions for parsing and labeling, resulting in 1,650,615 samples. Data imbalance was addressed using a combination of ADASYN and Random Undersampling. Two algorithms, Random Forest and Support Vector Machine (SVM), were compared based on accuracy, precision, recall, F1-score, and ROC curve metrics. The results show that Random Forest achieved the best performance with 99.92% accuracy, 99.94% F1-score, and 0.9994 AUC, while SVM obtained an accuracy of 96.45%. The combination of resampling and ensemble learning significantly enhances the effectiveness of log-based attack detection, providing a promising foundation for the development of adaptive Intrusion Detection Systems (IDS) in web server environments.


Keywords


Machine Learning; Nginx; Random Forest; SQL Injection; XSS

Full Text:

PDF

References


Chen Y, Liang G, Wang Q. Research on SQL Injection Detection Technology Based on Content Matching and Deep Learning. Computers, Materials and Continua 2025;84:1145–67. https://doi.org/10.32604/cmc.2025.063319.

OWASP Top 10:2021 n.d. https://owasp.org/Top10/A03_2021-Injection/ (accessed September 18, 2025).

Kavitha C, Saravanan M, Gadekallu TR, Nimala K, Kavin BP, Lai WC. Filter-Based Ensemble Feature Selection and Deep Learning Model for Intrusion Detection in Cloud Computing. Electronics (Switzerland) 2023;12. https://doi.org/10.3390/electronics12030556.

Rahayu A, Yulyanti E, Ghalib M. Systematic Literature Review: SQL Injection Detection Vulnerability Using Machine Learning. Jurnal Media Infotama 2025;21:15–20. https://doi.org/10.37676/jmi.v21i1.6702.

Qin Q, Li Y, Mi Y, Shen J, Wu K, Wang Z. Detecting XSS with Random Forest and Multi-Channel Feature Extraction. Computers, Materials and Continua 2024;80:843–74. https://doi.org/10.32604/cmc.2024.051769.

Triloka J, Hartono H, Sutedi S. Detection of SQL Injection Attack Using Machine Learning Based On Natural Language Processing. International Journal of Artificial Intelligence Research 2022;6. https://doi.org/10.29099/ijair.v6i2.355.

Rosca CM, Stancu A, Popescu C. Machine Learning Models for SQL Injection Detection. Electronics (Switzerland) 2025;14. https://doi.org/10.3390/electronics14173420.

Tadhani JR, Vekariya V, Sorathiya V, Alshathri S, El-Shafai W. Securing web applications against XSS and SQLi attacks using a novel deep learning approach. Sci Rep 2024;14. https://doi.org/10.1038/s41598-023-48845-4.

Alsarhan A, Hussein F, Moh S, El-Salhi FS. The Effect of Preprocessing Techniques, Applied to Numeric Features, on Classification Algorithms’ Performance. Data (Basel) 2021. https://doi.org/10.3390/data6020011.

Chen Q, Banerjee A, Demiralp Ç, Durrett G, Dillig I. Data Extraction via Semantic Regular Expression Synthesis. Proceedings of the ACM on Programming Languages 2023;7. https://doi.org/10.1145/3622863.

McKenzie J, Rajapakshe R, Shen H, Rajapakshe S, Lin A. A Semiautomated Chart Review for Assessing the Development of Radiation Pneumonitis Using Natural Language Processing: Diagnostic Accuracy and Feasibility Study. JMIR Med Inform 2021;9. https://doi.org/10.2196/29241.

SQL injection cheat sheet n.d. https://portswigger.net/web-security/sql-injection/cheat-sheet (accessed September 18, 2025).

Cross-site scripting (XSS) cheat sheet n.d. https://portswigger.net/web-security/cross-site-scripting/cheat-sheet (accessed September 18, 2025).

Jin Z. Principle, Methodology and Application for Data Cleaning techniques. BCP Business & Management FIBA 2022;26. https://doi.org/10.54691/bcpbm.v26i.2032.

Lan F. Research on Text Similarity Measurement Hybrid Algorithm with Term Semantic Information and TF-IDF Method. Advances in Multimedia 2022. https://doi.org/10.1155/2022/7923262.

Firmansyah MR, Astuti YP. Stroke Classification Comparison with KNN through Standardization and Normalization Techniques. Advance Sustainable Science, Engineering and Technology 2024;6. https://doi.org/10.26877/asset.v6i1.17685.

Kamal H. Advanced Hybrid Transformer-CNN Deep Learning Model for Effective Intrusion Detection Systems with Class Imbalance Mitigation Using Resampling Techniques. Future Internet 2024;16. https://doi.org/10.3390/fi16120481.

Zakariah M, AlQahtani SA, Al-Rakhami MS. Machine Learning-Based Adaptive Synthetic Sampling Technique for Intrusion Detection. Applied Sciences (Switzerland) 2023;13. https://doi.org/10.3390/app13116504.

Ratnasari AP. Performance of Random Oversampling, Random Undersampling, and SMOTE-NC Methods in Handling Imbalanced Class in Classification Models. International Journal of Scientific Research and Management (IJSRM) 2024;12:494–501. https://doi.org/10.18535/ijsrm/v12i04.m03.

Yustanti W, Iriawan N, Irhamah. Categorical encoder based performance comparison in preprocessing imbalanced multiclass classification. Indonesian Journal of Electrical Engineering and Computer Science 2023;31:1705–15. https://doi.org/10.11591/ijeecs.v31.i3.pp1705-1715.




DOI: http://dx.doi.org/10.24014/ijaidm.v8i3.38397

Refbacks

  • There are currently no refbacks.


Office and Secretariat:

Big Data Research Centre
Puzzle Research Data Technology (Predatech)
Laboratory Building 1st Floor of Faculty of Science and Technology
UIN Sultan Syarif Kasim Riau

Jl. HR. Soebrantas KM. 18.5 No. 155 Pekanbaru Riau – 28293
Website: http://predatech.uin-suska.ac.id/ijaidm
Email: ijaidm@uin-suska.ac.id
e-Journal: http://ejournal.uin-suska.ac.id/index.php/ijaidm
Phone: 085275359942

Click Here for Information


Journal Indexing:

Google Scholar | ROAD | PKP Index | BASE | ESJI | General Impact Factor | Garuda | Moraref | One Search | Cite Factor | Crossref | WorldCat | Neliti  | SINTA | Dimensions | ICI Index Copernicus 

IJAIDM Stats