Identifying Twitter Topics Using K-Means Clustering and Association Rule Mining for Improved Insights

Cristiany Gunu Lengari, Ira Puspitasari

Abstract


The annual growth in social media users has led businesses to increasingly leverage these platforms for marketing, promotion, and addressing public complaints. Twitter, now known as X, stands out as one of the most widely used social media platforms. It serves as a forum for various opinions and complaints regarding services provided by businesses. This study focuses on analyzing public opinions related to Indihome services, as expressed on the @indihomecare Twitter account. These opinions range from expressions of support to complaints about internet services and Indihome's responses to these issues. This study employs a text clustering approach using the K-means algorithm on Twitter data, complemented by association rules to identify topics related to Indihome customer complaints. The optimal number of clusters is determined using the Elbow method, while Word Cloud visualizations are utilized to illustrate frequently occurring words within each cluster. The application of association rules revealed that the most frequently appearing words, with a support value of 0.057, were "indihome," "account," "whatsapp," and "channel." These findings provide insights into the primary concerns and communication channels used by Indihome customers on Twitter

Keywords


Association Rules; Clustering; Improved Insights; K-Means; Twitter

References


P. Iswara, “Jumlah Pengguna Twitter di Indonesia Capai 14,75 Juta per April 2023, Peringkat Keenam Dunia,” Databoks.

M. Saari, L. Haapanen, and P. Hurmelinna-laukkanen, “Social media and international business : views and conceptual framing,” vol. 39, no. 7, pp. 25–45, 2022, doi: 10.1108/IMR-06-2021-0191.

I. Ardhanayudha, F. Nurrohman, I. Haryani, and A. Alamsyah, “Understanding service quality concerns from public discourse in Indonesia state electric company,” Heliyon, vol. 9, no. 8, p. e18768, 2023, doi: 10.1016/j.heliyon.2023.e18768.

Y. Jeong and Jin-Heeku, “A study on social big data analysis using text clustering,” vol. 7, pp. 1–4, 2018.

J. Han, M. Kamber, and J. Pei, Data Mining Concepts and Techniques, Third Edit. 2012.

I. H. Witten, E. Frank, and M. A. Hall, Data Mining Practical Machine Learning Tools and Techniques, Third Edit. United States of America: Elsevier, 2011.

S. A. Barnard, S. M. Chung, and V. A. Schmidt, “Content-based Clustering and Visualization of Social Media Text Messages,” 2017.

J. Jussila, A. H. Suominen, and A. Partanen, “Text Analysis Methods for Misinformation – Related Research on Finnish Language Twitter,” pp. 1–16, 2021.

X. Chen, C. Liu, B. Lin, J. Lai, and D. Miao, “AHA-3WKM: The optimization of K-means with three-way clustering and artificial hummingbird algorithm,” Inf. Sci. (Ny)., vol. 672, no. November 2023, p. 120661, 2024, doi: 10.1016/j.ins.2024.120661.

B. Mudumba and M. F. Kabir, “Mine-first association rule mining: An integration of independent frequent patterns in distributed environments,” Decis. Anal. J., vol. 10, no. February, p. 100434, 2024, doi: 10.1016/j.dajour.2024.100434.

A. S. Halibas, “Application of Text Classification and Clustering of Twitter Data for Business Analytics,” pp. 1–7, 2018.

O. Iparraguirre-villanueva et al., “Sentiment Analysis of Tweets using Unsupervised Learning Techniques and the K-Means Algorithm,” vol. 13, no. 6, pp. 571–578, 2022.

A. Şenol, “ImpKmeans: An Improved Version of the K-Means Algorithm, by Determining Optimum Initial Centroids, based on Multivariate Kernel Density Estimation and Kd-Tree,” Acta Polytech. Hungarica, vol. 21, no. 2, pp. 111–131, 2024, doi: 10.12700/APH.21.2.2024.2.6.

M. F. Kabir, S. A. Ludwig, and A. S. Abdullah, “Rule Discovery from Breast Cancer Risk Factors using Association Rule Mining,” Proc. - 2018 IEEE Int. Conf. Big Data, Big Data 2018, pp. 2433–2441, 2018, doi: 10.1109/BigData.2018.8622028.

S. Mabu, T. Higuchi, and T. Kuremoto, “SemiSupervised Learning for Class Association Rule Mining Using Genetic Network Programming,” IEEJ Trans. Electr. Electron. Eng., vol. 15, no. 5, pp. 733–740, 2020, doi: 10.1002/tee.23109.

Z. F. Sokhangoee and A. Rezapour, “A novel approach for spam detection based on association rule mining and genetic algorithm,” Comput. Electr. Eng., vol. 97, no. January 2022, 2022, doi: 10.1016/j.compeleceng.2021.107655.

J. Tamaela, E. Sediyono, and A. Setiawan, “Implementasi Metode Association Rule untuk Menganalisis Data Twitter tentang Badan Penyelenggara Jaminan Sosial dengan Algoritma Frequent Pattern-Growth,” J. Sist. Inf. Bisnis, vol. 8, no. 1, p. 25, 2018, doi: 10.21456/vol8iss1pp25-33.

P. S. Reddy, D. Renu Sri, C. S. Reddy, and S. Shaik, “Sentimental Analysis using Logistic Regression,” Int. J. Eng. Res. Appl. www.ijera.com, vol. 11, no. 7, pp. 36–40, 2021, doi: 10.9790/9622-1107023640.

R. Sitaram, “An employee segmentation model,” no. June, 2021.

N. Garg and R. Rani, “Analysis and Visualization of Twitter Data using k-means Clustering,” Int. Conf. Intell. Comput. Control Syst., pp. 670–675, 2017.




DOI: http://dx.doi.org/10.24014/ijaidm.v8i1.31720

Refbacks

  • There are currently no refbacks.


Office and Secretariat:

Big Data Research Centre
Puzzle Research Data Technology (Predatech)
Laboratory Building 1st Floor of Faculty of Science and Technology
UIN Sultan Syarif Kasim Riau

Jl. HR. Soebrantas KM. 18.5 No. 155 Pekanbaru Riau – 28293
Website: http://predatech.uin-suska.ac.id/ijaidm
Email: ijaidm@uin-suska.ac.id
e-Journal: http://ejournal.uin-suska.ac.id/index.php/ijaidm
Phone: 085275359942

Click Here for Information


Journal Indexing:

Google Scholar | ROAD | PKP Index | BASE | ESJI | General Impact Factor | Garuda | Moraref | One Search | Cite Factor | Crossref | WorldCat | Neliti  | SINTA | Dimensions | ICI Index Copernicus 

IJAIDM Stats