Evaluation of the Latent Dirichlet Allocation for Modeling News Topics of Nusantara Capital City
DOI:
https://doi.org/10.24014/coreit.v11i2.33397Keywords:
IKN, LDA, log-likelihood, model evaluation, perplexityAbstract
Research regarding topic modeling on the coverage of the Nusantara Capital City (IKN) in national mass media remains limited. This study aims to not only model IKN-related topics but also rigorously evaluate the Latent Dirichlet Allocation (LDA) model to ensure its robustness for future implementation. The dataset comprises 1,498 news articles gathered from prominent Indonesian online media, specifically Detik (1,050 articles) and Kompas (448 articles). The methodology involves experimental variations of LDA parameters, including document volume, maximum features, and topic count, utilizing the Scikit-learn library. The results indicate that an increase in data volume and feature dimensions significantly correlates with longer computation times and a higher number of epochs required for convergence. Furthermore, the expansion of variables and data volume resulted in more negative log-likelihood values and increased perplexity, suggesting that model complexity challenges predictive precision. A convergence threshold of $1e^{-2}$ was applied to optimize the training cessation point. While this study establishes a baseline for static topic modeling, future research implies the necessity of Dynamic Topic Modeling (DTM) to capture the temporal evolution of topics, a dimension not addressed by the standard LDA model.
References
S. D. Saputra, T. G. J., and M. Halkis, “Analisis Strategi Pemindahan Ibu Kota Negara Indonesia Ditinjau dari Perspektif Ekonomi Pertahanan,” 192 | Jurnal Ekonomi Pertahanan |, vol. 7, no. 2, 2021.
R. R. A. Hasibuan and S. Aisa, “DAMPAK DAN RESIKO PERPINDAHAN IBU KOTA TERHADAP EKONOMI DI INDONESIA,” AT-TAWASSUTH: Jurnal Ekonomi Islam, vol. 5, no. 1, 2020, doi: 10.30829/ajei.v5i1.7947.
R. Cybriwsky and L. R. Ford, “City profile Jakarta,” Cities, vol. 18, no. 3, 2001, doi: 10.1016/S0264-2751(01)00004-X.
Y. S. Amelinda, R. A. Wulandari, and A. Asyary, “The effects of climate factors, population density, and vector density on the incidence of dengue hemorrhagic fever in South Jakarta Administrative City 2016-2020: an ecological study,” Acta Biomedica, vol. 93, no. 6, 2022, doi: 10.23750/abm.v93i6.13503.
R. Setiadi, J. Baumeister, P. Burton, and J. Nalau, “Extending Urban Development on Water: Jakarta Case Study,” Environment and Urbanization ASIA, vol. 11, no. 2, 2020, doi: 10.1177/0975425320938539.
E. U. Nainggolan, “Urgensi Pemindahan Ibu Kota Negara,” https://www.djkn.kemenkeu.go.id/kanwil-kalbar/baca-artikel/14671/Urgensi-Pemindahan-Ibu-Kota-Negara.html.
P. Arsi and R. Waluyo, “Analisis Sentimen Wacana Pemindahan Ibu Kota Indonesia Menggunakan Algoritma Support Vector Machine (SVM),” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 8, no. 1, p. 147, 2021, doi: 10.25126/jtiik.0813944.
H. Dhery, A. Assyam, and F. N. Hasan, “Analisis Sentimen Twitter Terhadap Perpindahan Ibu Kota Negara Ke IKN Nusantara Menggunakan Orange Data Mining,” KLIK: Kajian Ilmiah Informatika dan Komputer, vol. 4, no. 1, 2023.
Y. Sunesti and A. K. Putri, “Narasi Ibu Kota Negara Baru di Twitter: dari Isu Kearifan Lokal Hinggi Isu Anak Muda,” 2022.
Syahril Dwi Prasetyo, Shofa Shofiah Hilabi, and Fitri Nurapriani, “Analisis Sentimen Relokasi Ibukota Nusantara Menggunakan Algoritma Naïve Bayes dan KNN,” Jurnal KomtekInfo, 2023, doi: 10.35134/komtekinfo.v10i1.330.
S. Anggraeni and S. D. Saraswati, “Klasifikasi Sentimen Terhadap Ibu Kota Nusantara (IKN) pada Media Sosial Menggunakan Naive Bayes,” Teknika, vol. 16, no. 2, 2022.
F. Nurdiyansyah and L. U. Pratama, “Analisis sentimen perpindahan ibu kota negara pada aplikasi Tiktok menggunakan metode LSTM,” Teknosains: Media Informasi dan Teknologi, vol. 17, no. 3, pp. 382–387, 2023.
Y. Ardian Pradana, I. Cholissodin, and D. Kurnianingtyas, “Analisis Sentimen Pemindahan Ibu Kota Indonesia pada Media Sosial Twitter menggunakan Metode LSTM dan Word2Vec,” Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 7, no. 5, pp. 2389–2397, 2023, [Online]. Available: http://j-ptiik.ub.ac.id
F. Zamachsari, G. V. Saragih, Susafa’ati, and W. Gata, “Analisis Sentimen Pemindahan Ibu Kota Negara dengan Feature Selection,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 4, no. 3, 2020.
B. Dame Laoera and T. O. Wibowo, “Indonesian online news and digital culture: a media ecology perspective,” Jurnal Studi Komunikasi (Indonesian Journal of Communications Studies), vol. 7, no. 2, pp. 355–368, Jul. 2023, doi: 10.25139/jsk.v7i2.6190.
Y. Matira and I. Setiawan, “Pemodelan Topik pada Judul Berita Online Detikcom Menggunakan Latent Dirichlet Allocation,” Estimasi: Journal of Statistics and Its Application, vol. 4, no. 1, pp. 2721–379, 2023, doi: 10.20956/ejsa.vi.24843.
D. M. Blei, A. Y. Ng, and J. B. Edu, “Latent Dirichlet Allocation Michael I. Jordan,” 2003.
D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet allocation,” Journal of Machine Learning Research, vol. 3, no. 4–5, 2003, doi: 10.7551/mitpress/1120.003.0082.
M. Steyvers and T. Griffiths, “Probabilistic Topic Models,” in Handbook of Latent Semantic Analysis, Laurence Erlbaum, 2007.
D. Newman, E. V Bonilla, and W. Buntine, “Improving Topic Coherence with Regularized Topic Models. Improving Topic Coherence with Regularized Topic Models,” in Advances in Neural Information Processing Systems 24 (NIPS 2011), 2011. [Online]. Available: https://www.researchgate.net/publication/260639294
I. C. Chang, T. K. Yu, Y. J. Chang, and T. Y. Yu, “Applying text mining, clustering analysis, and latent dirichlet allocation techniques for topic classification of environmental education journals,” Sustainability (Switzerland), vol. 13, no. 19, 2021, doi: 10.3390/su131910856.
M. Röder, A. Both, and A. Hinneburg, “Exploring the space of topic coherence measures,” in WSDM 2015 - Proceedings of the 8th ACM International Conference on Web Search and Data Mining, Association for Computing Machinery, Feb. 2015, pp. 399–408. doi: 10.1145/2684822.2685324.
Scikit Learn, “LatentDirichletAllocation,” Scikit Learn. Accessed: Oct. 08, 2024. [Online]. Available: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html#sklearn.decomposition.LatentDirichletAllocation
O. M. AL-Janabi, N. H. Ahamed Hassain Malim, and Y. N. Cheah, “Unsupervised model for aspect categorization and implicit aspect extraction,” Knowl Inf Syst, vol. 64, no. 6, pp. 1625–1651, Jun. 2022, doi: 10.1007/s10115-022-01678-5.
M. Guha, “Topic Modeling and the Latent Dirichlet Allocation,” https://www.mithilaguha.com/post/topic-modeling-and-latent-dirichlet-allocation.
Y. O. Santoso and S. A. Nugroho, “Pengelompokkan Jurnal Ilmiah Berdasarkan Judul Menggunakan LDA,” vol. 3, no. 1, pp. 31–42, 2019.
Quentin Pleple, “Perplexity To Evaluate Topic Models,” https://qpleple.com/perplexity-to-evaluate-topic-models/.
P. A. Telnoni, Suryatiningsih, and E. Rosely, “Pelabelan Data Dengan Latent Dirichlet Allocation dan K-Means Clustering pada Data Twitter Menggunakan Bahasa Indonesia,” Jurnal Elektro dan Telekomunikasi Terapan, vol. 7, no. 2, pp. 885–892, Mar. 2020, doi: 10.25124/jett.v7i2.3442.
M. D. Hoffman, D. M. Blei, C. Wang, and J. Paisley, “Stochastic Variational Inference,” Journal of Machine Learning Research, vol. 14, pp. 1303–1347, 2013.
H. Valpola and A. Honkela, “On-Line Variational Bayesian Learning,” 2003. [Online]. Available: http://www.cis.hut.fi/projects/ica/bayes/
Matti Lyra, “Evaluating Topic Models.” Accessed: Nov. 01, 2024. [Online]. Available: https://mattilyra.github.io/2017/07/30/evaluating-topic-models.html
Downloads
Additional Files
Published
Issue
Section
License
The Authors submitting a manuscript do so on the understanding that if accepted for publication, copyright of the article shall be assigned to CoreIT journal and published by Informatics Engineering Department Universitas Islam Negeri Sultan Syarif Kasim Riau as publisher of the journal.
Authors who publish with this journal agree to the following terms:
Authors automatically transfer the copyright to the journal and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlike (CC BY SA) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate permission for non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).