Sentiment Analysis and Topic Modelling on Crowdsourced Data

Maria Angelika H Siallagan, Arie Wahyu Wijayanto

Abstract


Data analysis plays a crucial role in enhancing the decision-making process by uncovering concealed patterns within the data. One valuable form of crowdsourced data is user reviews on applications, which can effectively capture the satisfaction levels of application users. Application developers can utilize these reviews to identify and assess areas of the application that require evaluation or improvement. This study focuses on the classification of application reviews by utilizing sentiment analysis and employs various classification algorithms, including logistic regression, Support Vector Machines, and Random Forest. Additionally, to address negative sentiment labels, topic modeling is conducted using Latent Dirichlet Allocation (LDA). This study demonstrates that the best sentiment classification model is logistic regression, achieving an average accuracy of 0.925 and an average F1-score of 0.763. Furthermore, the LDA analysis successfully generates topic models for negative reviews, revealing three key topics: price-related issues, accessibility concerns, and application accuracy, all of which demand reevaluation and potential improvement

Keywords


crowdsourced data;sentiment analysis;topic modelling

Full Text:

PDF

References


M. Gambhir and V. Gupta, “Recent automatic text summarization techniques: a survey,” Artif. Intell. Rev., vol. 47, no. 1, pp. 1–66, 2017, doi: 10.1007/s10462-016-9475-9.

K. P. Gunasekaran, “Exploring Sentiment Analysis Techniques in Natural Language Processing: A Comprehensive Review,” pp. 1–6, 2023.

Q. Tul et al., “Sentiment Analysis Using Deep Learning Techniques: A Review,” Int. J. Adv. Comput. Sci. Appl., vol. 8, no. 6, 2017, doi: 10.14569/ijacsa.2017.080657.

A. D’Andrea, F. Ferri, P. Grifoni, and T. Guzzo, “Approaches, Tools and Applications for Sentiment Analysis Implementation,” Int. J. Comput. Appl., vol. 125, no. 3, pp. 26–33, 2015, doi: 10.5120/ijca2015905866.

D. M. Blei and A. Y. Ng, “Latent Dirichlet Allocation,” no. January 2001, 2014.

E. S. Negara et al., “Topic Modelling Twitter Data with Latent Dirichlet Allocation Method,” 2019.

R. Ahuja, A. Chug, S. Kohli, S. Gupta, and P. Ahuja, “The Impact of Features Extraction on the Sentiment Analysis,” Procedia Comput. Sci., vol. 152, pp. 341–348, 2019, doi: 10.1016/j.procs.2019.05.008.

L. E. O. Breiman, “Random Forests,” pp. 5–32, 2001.

L. Xin, “A New Text Classifier Based on Random Forests,” vol. 107, no. Meita 2016, pp. 290–293, 2017.

C. C. Aggarwal and C. X. Zhai, “A SURVEY OF TEXT CLASSIFICATION ALGORITHMS,” Min. Text Data, vol. 9781461432, pp. 1–522, 2013, doi: 10.1007/978-1-4614-3223-4.

S. B. Bhonde and J. R. Prasad, “Sentiment Analysis-Methods, Applications & Challenges,” Int. J. Electron. Commun. Comput. Eng., vol. 6, no. 6, pp. 2278–4209, 2015.

R. Moraes, J. F. Valiati, and W. P. Gavião Neto, “Document-level sentiment classification: An empirical comparison between SVM and ANN,” Expert Syst. Appl., vol. 40, no. 2, pp. 621–633, 2013, doi: 10.1016/j.eswa.2012.07.059.

A. Amolik, N. Jivane, M. Bhandari, and M. Venkatesan, “Twitter Sentiment Analysis of Movie Reviews using Machine Learning,” vol. 7, no. 6, pp. 2038–2044, 2016.

B. Liu, “Sentiment Analysis: Mining Opinions, Sentiments, and Emotions,” 2015, doi: 10.1162/COLI.

J. C. Campbell, A. Hindle, and E. Stroulia, “Latent Dirichlet Allocation : Extracting Topics from Software Engineering Data,” pp. 1–21, 2014.

S. Syed and M. Spruit, “Full-Text or abstract? Examining topic coherence scores using latent dirichlet allocation,” Proc. - 2017 Int. Conf. Data Sci. Adv. Anal. DSAA 2017, vol. 2018-January, no. September, pp. 165–174, 2017, doi: 10.1109/DSAA.2017.61.

M. D. Hoffman, D. M. Blei, and F. Bach, “Online learning for Latent Dirichlet Allocation,” Adv. Neural Inf. Process. Syst. 23 24th Annu. Conf. Neural Inf. Process. Syst. 2010, NIPS 2010, pp. 1–9, 2010.

N. A. Salsabila, Y. Ardhito, W. Ali, A. Septiandri, and A. Jamal, “Colloquial Indonesian Lexicon.”




DOI: http://dx.doi.org/10.24014/ijaidm.v7i1.24777

Refbacks

  • There are currently no refbacks.


Office and Secretariat:

Big Data Research Centre
Puzzle Research Data Technology (Predatech)
Laboratory Building 1st Floor of Faculty of Science and Technology
UIN Sultan Syarif Kasim Riau

Jl. HR. Soebrantas KM. 18.5 No. 155 Pekanbaru Riau – 28293
Website: http://predatech.uin-suska.ac.id/ijaidm
Email: ijaidm@uin-suska.ac.id
e-Journal: http://ejournal.uin-suska.ac.id/index.php/ijaidm
Phone: 085275359942

Click Here for Information


Journal Indexing:

Google Scholar | ROAD | PKP Index | BASE | ESJI | General Impact Factor | Garuda | Moraref | One Search | Cite Factor | Crossref | WorldCat | Neliti  | SINTA | Dimensions | ICI Index Copernicus 

IJAIDM Stats