Named Entity Recognition Using Conditional Random Fields for Flood Detection In Gerbang Kertosusila Based Twitter Data

Ikrimatul Ulumiyyah, Dwi Rolliawati, Andik Izzuddin, Khalid Khalid, Anang Khunaefi, Mujib Ridwan

Abstract


The national strategic area Gerbang Kertosusila East Java should be aware of floods. One of the existing efforts is to place flood sensors at several flood-prone points. However, that way is constrained by the need for more equipment to handle the many needy areas. So it is necessary to develop technology for the dissemination of flood information. Dissemination of flood information was quickly obtained from social media Twitter. One way is to use Twitter's text data source for a Named Entity Recognition model to help detect flood events and their locations. The Named Entity Recognition (NER) model was constructed using the Conditional Random Fields (CRFs) method to achieve research objectives. This research adds slang word handling at the preprocessing stage to improve model performance and the use of the BIO format in the labeling process and POS Tagging in the Feature Extraction process. Evaluation results with five Kfolds, 80% training data, and 20% test data show that the NER CRFs model performs excellently with a Precision of 0.981, Recall of 0.926, and f-measure of 0.950 so that these results can help the community and government regarding the information on the distribution of floods.

Keywords


Conditional Random Fields; Flood Detection; Gerbang Ketosusila; Natural Language Processing; Named Entity Recognition

Full Text:

PDF

References


K. Aruna, Dr. M. V. Subramanian (RTD), Dr. B. Jaya sudha, and Bharathidasan university, “Studies on Seasonal variations of rainfall in java island at Indonesia,” J Algebr Stat, vol. 13, no. 3, pp. 1481–1489, 2022.

D. B. Baranowski et al., “Social-media and newspaper reports reveal large-scale meteorological drivers of floods on Sumatra,” Nat Commun, vol. 11, no. 1, pp. 1–10, 2020, doi: 10.1038/s41467-020-16171-2.

I. Utami and M. Marzuki, “Analisis sistem informasi banjir berbasis media twitter,” Jurnal Fisika Unand, vol. 9, no. 1, pp. 67–72, 2020.

M. H. Awalludin, F. Teknik, U. K. Indonesia, and J. D. Bandung, “EVENT DETECTION PADA MICROBLOGGING TWITTER DENGAN METODE DENCLUE UNTUK PEMETAAN LOKASI BENCANA LONGSOR,” JBPTUNIKOMPP, 2018, [Online]. Available: https://repository.unikom.ac.id/id/eprint/58405

I. Utami and M. Marzuki, “Analisis sistem informasi banjir berbasis media twitter,” Jurnal Fisika Unand, vol. 9, no. 1, pp. 67–72, 2020, [Online]. Available: http://jfu.fmipa.unand.ac.id/index.php/jfu/article/view/454

E. Kapetanios, D. Tatar, and C. Sacarea, “Named Entity Recognition,” Natural Language Processing, vol. 8, no. 2, pp. 309–322, 2013, doi: 10.1201/b15472-19.

F. Béchet and B. Mohit, “Named Entity Recognition,” Spoken Language Understanding: Systems for Extracting Semantic Information from Speech, pp. 257–290, 2011, doi: 10.1002/9781119992691.ch10.

F. Muhammad and M. L. Khodra, “Event information extraction from Indonesian tweets using conditional random field,” ICAICTA 2015 - 2015 International Conference on Advanced Informatics: Concepts, Theory and Applications, pp. 0–5, 2015, doi: 10.1109/ICAICTA.2015.7335383.

M. Ermawati and J. L. Buliali, “Text Based Approach For Similar Traffic Incident Detection from Twitter,” Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, vol. 9, no. 2, p. 63, 2018, doi: 10.24843/lkjiti.2018.v09.i02.p01.

Y. Munarko, U. M. Malang, and Y. Munarko, “Ekstraksi Nama Lokasi Dari Tweets Informasi,” Seminar Teknologi dan Rekayasa (SENTRA), pp. 978–979, 2015.

N. Jaariyah and E. Rainarli, “Conditional Random Fields Untuk Pengenalan Entitas Bernama Pada Teks Bahasa Indonesia,” Komputa : Jurnal Ilmiah Komputer dan Informatika, vol. 6, no. 1, pp. 29–34, 2017, doi: 10.34010/komputa.v6i1.2474.

Y. Munarko, M. S. Sutrisno, W. A. I. Mahardika, I. Nuryasin, and Y. Azhar, “Named entity recognition model for Indonesian tweet using CRF classifier,” IOP Conference Series: Materials Science and Engineering PAPER, 2018, doi: 10.1088/1757-899X/403/1/012067.

W. Ahmed, P. A. Bath, and G. Demartini, “USING TWITTER AS A DATA SOURCE: AN OVERVIEW OF ETHICAL, LEGAL, AND METHODOLOGICAL CHALLENGES,” Emerald Publishing Limited, vol. 2, pp. 79–107, 2017, doi: https://doi.org/10.1108/S2398-601820180000002004.

N. Patil, A. Patil, and B. V. Pawar, “Named Entity Recognition using Conditional Random Fields,” Procedia Comput Sci, vol. 167, no. 2019, pp. 1181–1188, 2020, doi: 10.1016/j.procs.2020.03.431.

L. Owen, “Indonesian Stopword Combined.” [Online]. Available: https://github.com/louisowen6/NLP_bahasa_resources/blob/master/combined_stop_words.txt

N. A. Salsabila, Y. Ardhito, W. Ali, A. Septiandri, and A. Jamal, “Colloquial Indonesian Lexicon,” 2018 International Conference on Asian Language Processing (IALP), pp. 226–229, 2018.

D. T. Wijaya, “IndoCollex : A Testbed for Morphological Transformation of Indonesian Colloquial Words,” no. 2017, pp. 3170–3183, 2021.

Sastrawi · GitHub. Accessed: Jun. 22, 2022. [Online]. Available: https://github.com/sastrawi

A. Dinakaramani, F. Rashel, A. Luthfi, and R. Manurung, “Designing an Indonesian part of speech tagset and manually tagged Indonesian corpus,” Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014, pp. 66–69, 2014, doi: 10.1109/IALP.2014.6973519.

Yudi Wibisono, “POS Tagger Bahasa Indonesia dengan Python – Blog Yudi Wibisono.” Accessed: Jun. 22, 2022. [Online]. Available: https://yudiwbs.wordpress.com/2018/02/20/pos-tagger-bahasa-indonesia-dengan-pytho/

L. Mardiana, D. Kusnandar, and N. Satyahadewi, “Analisis Diskriminan Dengan K Fold Cross Validation Untuk Klasifikasi Kualitas Air Di Kota Pontianak,” Bimaster : Buletin Ilmiah Matematika, Statistika dan Terapannya, vol. 11, no. 1, pp. 97–102, 2022.

R. Klinger, “Classical Probabilistic Models and Conditional Random Fields,” Entropy, vol. 51, no. December, pp. 282–289, 2007.

C. Sutton and A. McCallum, “An introduction to conditional random fields,” Foundations and Trends in Machine Learning, vol. 4, no. 4, pp. 267–373, 2011, doi: 10.1561/2200000013.

H. M. Wallach, “ScholarlyCommons Conditional Random Fields : An Introduction Conditional Random Fields : An Introduction,” no. February, 2004.

J. Suzuki, E. McDermott, and H. Isozaki, Training Conditional Random Fields with Multivariate Evaluation Measures. 2006. doi: 10.3115/1220175.1220203.

N. Okazaki, “a fast implementation of Conditional Random Fields.” 2007.

D. J. Hand, P. Christen, and N. Kirielle, “F*: an interpretable transformation of the F-measure,” Mach Learn, vol. 110, no. 3, pp. 451–456, 2021, doi: 10.1007/s10994-021-05964-1.




DOI: http://dx.doi.org/10.24014/ijaidm.v7i2.27062

Refbacks

  • There are currently no refbacks.


Office and Secretariat:

Big Data Research Centre
Puzzle Research Data Technology (Predatech)
Laboratory Building 1st Floor of Faculty of Science and Technology
UIN Sultan Syarif Kasim Riau

Jl. HR. Soebrantas KM. 18.5 No. 155 Pekanbaru Riau – 28293
Website: http://predatech.uin-suska.ac.id/ijaidm
Email: ijaidm@uin-suska.ac.id
e-Journal: http://ejournal.uin-suska.ac.id/index.php/ijaidm
Phone: 085275359942

Click Here for Information


Journal Indexing:

Google Scholar | ROAD | PKP Index | BASE | ESJI | General Impact Factor | Garuda | Moraref | One Search | Cite Factor | Crossref | WorldCat | Neliti  | SINTA | Dimensions | ICI Index Copernicus 

IJAIDM Stats