Uncovering Malware Families Using Convolutional Neural Networks (CNN)

Ruly Sumargo, Handri Santoso

Abstract


Malware attacks pose significant cyber threats, with a rising number of vulnerability reports in security communities due to the continual introduction of mutations by malware programmers to evade detection. One of the most attractive targets which attacked by malware is the organization emails system. Malware’s mutations within the malware family, has complicating the development of effective machine learning-based malware analysis and classification methods. To answer this challenge, this research uses an agnostic deep learning solution inspired by ImageNet's success, which efficiently classifies malware into families by analyzing visual representations of malicious software as greyscale images using a Convolutional Neural Network (CNN). The Malwizard is a flexible Python tool suitable for both organizations and end-users enabling automated and rapid malware analysis within email system. Malwizard could be use as an Outlook Email’s add-in and an API service for SOAR platforms. The study evaluates this novel approach using the Microsoft Classification Challenge dataset, where image representations are encrypted to address privacy concerns. Experimental results show that the proposed approach performs comparably to the best existing model on plain text data, accomplishing the task in one-third of the time. For the encrypted dataset, adjustments to classical techniques are necessary for improved efficiency.


Keywords


Convolutional Neural Network; Cyberattack; Email; Malware; Python

Full Text:

PDF

References


A. R. Yogasware, D. R. Akbi, and V. R. Nastiti, “Klasifikasi Malware Family Menggunakan Metode K-Nearest Neighbor (K-NN),” REPOSITOR, pp. 305–314, 2021, doi: https://doi.org/10.47065/bits.v5i1.3538.

Y. D. Puji Rahayu and Nanang Trianto, “Analisis Malware Menggunakan Metode Analisis Statis dan Dinamis untuk Pembuatan IOC Berdasarkan STIX Versi 2.1,” Info Kripto, vol. 15, no. 3, pp. 105–111, Nov. 2021, doi: 10.56706/ik.v15i3.30.

F. Panjaitan, H. Yudiastuti, and M. Ulfa, “Analisis Malware dengan metode Surface dan Runtime Analysis,” J. Ilm. Matrik, vol. 23, no. 1, pp. 1–11, Apr. 2021, doi: 10.33557/jurnalmatrik.v23i1.1148.

S. Adiwal, A. Gupta, B. Rajendran, and B. S. Bindhumadhava, “A Secure Methodology for Filtering Spam & Malware in E-mail System and Secure E-mail Testbed Setup,” Int. J. Adv. Trends Comput. Sci. Eng., vol. 10, no. 2, pp. 651–657, Apr. 2021, doi: 10.30534/ijatcse/2021/271022021.

M. Office, Chrisda, and V. Ratulach, “Email authentication in EOP,” 2023. https://learn.microsoft.com/en-us/microsoft-365/security/office-365-security/email-authentication-about?view=o365-worldwide (accessed Aug. 22, 2023).

T. Muralidharan and N. Nissim, “Improving malicious email detection through novel designated deep-learning architectures utilizing entire email,” Neural Networks, vol. 157, pp. 257–279, Jan. 2023, doi: 10.1016/j.neunet.2022.09.002.

N. Suri, L. Barriga, M. Franco, F. Cutas, and C. Ardagna, “Work Package 4: Policy and the European Dimension Preliminary Version of Deliverable D 4.4: Cybersecurity Roadmap for Europe by CONCORDIA,” 2020. [Online]. Available: https://www.concordia-h2020.eu/wp-content/uploads/2021/03/Deliverables_D4.4-M24.pdf

European Union Agency for Cybersecurity (ENISA), “List of top 15 threats.” https://www.enisa.europa.eu/topics/cyber-threats/threats-and-trends/etl-review-folder/etl-2020-enisas-list-of-top-15-threats (accessed Aug. 25, 2022).

M. Hazri, “Analisis Malware PlasmaRAT dengan Metode Reverse Engineering,” J. Rekayasa Teknol. Inf., vol. 4, no. 2, p. 192, Nov. 2020, doi: 10.30872/jurti.v4i2.4131.

H. Saputra, S. Basuki, and M. Faiqurahman, “Implementasi Teknik Seleksi Fitur Pada Klasifikasi Malware Android Menggunakan Support Vector Machine,” Fountain Informatics J., vol. 3, no. 1, p. 12, May 2018, doi: 10.21111/fij.v3i1.1875.

D. Efriyani and F. Panjaitan, “Klasifikasi Malware Dengan Menggunakan Recurrent Neural Network,” J. Ilm. Matrik, vol. 23, no. 3, pp. 310–316, 2021, doi: 10.33557/jurnalmatrik.v23i3.1592.

P. B. N. Setio, D. R. S. Saputro, and Bowo Winarno, “Klasifikasi Dengan Pohon Keputusan Berbasis Algoritme C4.5,” Prism. Pros. Semin. Nas. Mat., vol. 3, pp. 64–71, 2020, [Online]. Available: https://journal.unnes.ac.id/sju/index.php/prisma/article/download/37650/15478/

A. Djenna, A. Bouridane, S. Rubab, and I. M. Marou, “Artificial Intelligence-Based Malware Detection, Analysis, and Mitigation,” Symmetry (Basel)., vol. 15, no. 3, p. 677, Mar. 2023, doi: 10.3390/sym15030677.

J. Pavithra and S. Selvakumara Samy, “A Comparative Study on Detection of Malware and Benign on the Internet Using Machine Learning Classifiers,” Math. Probl. Eng., vol. 2022, pp. 1–8, Jun. 2022, doi: 10.1155/2022/4893390.

F. A. Khatami, B. Irawan, and C. Setianingsih, “Analisis Sentimen Terhadap Review Aplikasi Layanan E-Commerce Menggunakan Metode Convolutional Neural Network,” e-Proceeding Eng., vol. 7, no. 2, pp. 4559–4566, 2020, [Online]. Available: https://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/12305

M. Afif, A. Fawwaz, K. N. Ramadhani, and F. Sthevanie, “Klasifikasi Ras pada Kucing menggunakan Algoritma Convolutional Neural Network(CNN),” J. Tugas Akhir Fak. Inform., vol. 8, no. 1, pp. 715–730, 2020, doi: https://doi.org/10.34818/eoe.v8i1.14320.

Phrabu, “Understanding of Convolutional Neural Network (CNN) — Deep Learning,” 2018.

P. O. A. Sunarya, R. Refianti, A. B. Mutiara, and W. Octaviani, “Comparison of accuracy between convolutional neural networks and Naïve Bayes Classifiers in sentiment analysis on Twitter,” Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 5, pp. 77–86, 2019, doi: 10.14569/ijacsa.2019.0100511.

Yudi Widhiyasana, Transmissia Semiawan, Ilham Gibran Achmad Mudzakir, and Muhammad Randi Noor, “Penerapan Convolutional Long Short-Term Memory untuk Klasifikasi Teks Berita Bahasa Indonesia,” J. Nas. Tek. Elektro dan Teknol. Inf., vol. 10, no. 4, pp. 354–361, 2021, doi: 10.22146/jnteti.v10i4.2438.

C. R. Kothari, Research Methodology Methods & Techniques, 2nd Editio. New Delhi: NEW AGE INTERNATIONAL PUBLISHERS, 2004.

K. A. Adams and E. K. Lawrence, Research Methods, Statistics, and Applications, 2nd Editio. Los Angeles: SAGE, 2019.

R. Ronen, M. Radu, C. Feuerstein, E. Yom-Tov, and M. Ahmadi, Microsoft Malware Classification Challenge. 2018. doi: https://doi.org/10.48550/arXiv.1802.10135.

StrangeBee, “The Hive a 4 in 1 Security Incident Response Platform,” 2019. https://thehive-project.org/ (accessed Jun. 03, 2023).

R. Gilad-Bachrach, N. Dowlin, K. Laine, K. Lauter, M. Naehrig, and J. Wernsing, “Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy,” 33rd Int. Conf. Mach. Learn. ICML 2016, vol. 1, pp. 342–351, 2016.

E. Hesamifard, H. Takabi, and M. Ghasemi, “CryptoDL: Deep Neural Networks over Encrypted Data,” Nov. 2017, doi: https://arxiv.org/pdf/1711.05189.pdf.




DOI: http://dx.doi.org/10.24014/ijaidm.v7i1.27243

Refbacks

  • There are currently no refbacks.


Office and Secretariat:

Big Data Research Centre
Puzzle Research Data Technology (Predatech)
Laboratory Building 1st Floor of Faculty of Science and Technology
UIN Sultan Syarif Kasim Riau

Jl. HR. Soebrantas KM. 18.5 No. 155 Pekanbaru Riau – 28293
Website: http://predatech.uin-suska.ac.id/ijaidm
Email: ijaidm@uin-suska.ac.id
e-Journal: http://ejournal.uin-suska.ac.id/index.php/ijaidm
Phone: 085275359942

Click Here for Information


Journal Indexing:

Google Scholar | ROAD | PKP Index | BASE | ESJI | General Impact Factor | Garuda | Moraref | One Search | Cite Factor | Crossref | WorldCat | Neliti  | SINTA | Dimensions | ICI Index Copernicus 

IJAIDM Stats