Spam Classification on 2019 Indonesian President Election Youtube Comments Using Multinomial Naïve-Bayes

Jonathan Radot Fernando; Raymond Budiraharjo; Emeraldi Haganusa

doi:10.24014/ijaidm.v2i1.6445

Spam Classification on 2019 Indonesian President Election Youtube Comments Using Multinomial Naïve-Bayes

Jonathan Radot Fernando, Raymond Budiraharjo, Emeraldi Haganusa

Abstract

Text classification are used in many aspect of technologies such as spam classification, news categorization, Auto-correct texting. One of the most popular algorithm for text classification nowadays is Multinomial Naïve-Bayes. This paper explained how Naïve-Bayes assumption method works to classify 2019 Indonesian Election Youtube comments. The output prediction of this algorithm is spam or not spam. Spam messages are defined as racist comments, advertising comments, and unsolicited comments. The algorithms text representation method used bag-of-words method. Bag-of-words method defined a text as the multiset of its words. The algorithm then calculate the probability of a word given the class of spam or not spam. The main difference between normal Naïve-Bayes algorithm and Multinomial Naïve-Bayes is the way the algorithm treats the data itself. Multinomial Naïve-Bayes treats data as a frequency data hence it is suitable for text classification task.

Full Text:

PDF

References

Badan Pusat Statistik . 2102032. Kewarganegaraan, Suku Bangsa, Agama, dan Bahasa Sehari-hari Penduduk Indonesia. Jakarta. Badan Pusat Statistik Jakarta-Indonesia. 2011.

David S, Craney G. “How Do I Stop Spam”. 2001: 1-5.

Quinn, G. The Learner's Dictionary of Today's Indonesian. PhD Thesis. Sydney: Australian National University; 2001.

George S, Joseph S. Text Classification by Augmenting Bag of Words (BoW) Representation with Co-occurance Feature. IOSR Journal of ComputerEngineering. 2014; 16(1): 34-38.

Alberto T C, Lochter J V, Almeida T A. Tubespam: Comment Spam Filtering on YouTube. São Paulo: Federal University of São Carlos; 2015.

Vulandari R. T. Data Mining Teori dan Aplikasi Rapidminer. 1. Yogyakarta: Penerbit Gava Media. 2017: 7.

Kibriya A M, Frank E, Pfahringer B, Holmes G. Multinomial Naïve Bayes for Text Categorization Revisited. AI 2004 LNAI 3339. 2004: 488-499

Yuan Q, Cong G, Thalmann N M. Enhancing Naïve Bayes with Various Smoothing Methods for Short Text Classification. Singapore: Nanyang Technological University; 2012

Sokolova M, Japkowicz N, Szpakowicz S. Beyond Accuracy, F-score and ROC: a Family of Discriminant Measures for Performance Evaluation. Australian Joint Conference on Artificial Intelligence. Hobart. 2006; 19: 1015-1021.

Akosa J S. Predictive Accuracy: A Misleading Performance Measure for Highly Imbalanced Data. Oklahoma: Oklahoma State University; 2017.

DOI: http://dx.doi.org/10.24014/ijaidm.v2i1.6445

Refbacks

There are currently no refbacks.

Office and Secretariat:

Big Data Research Centre
Puzzle Research Data Technology (Predatech)
Laboratory Building 1st Floor of Faculty of Science and Technology
UIN Sultan Syarif Kasim Riau

Jl. HR. Soebrantas KM. 18.5 No. 155 Pekanbaru Riau – 28293
Website: http://predatech.uin-suska.ac.id/ijaidm
Email: ijaidm@uin-suska.ac.id
e-Journal: http://ejournal.uin-suska.ac.id/index.php/ijaidm
Phone: 085275359942

Journal Indexing:

IJAIDM Stats