Combining BERT and Graph-Based Ranking for Extractive Summarization of Indonesian News Articles

Authors

I Nyoman Prayana Trisna Udayana University
Wayan Oger Vihikan Udayana University
Anis Zahra Nur Azizah Udayana University

DOI:

https://doi.org/10.24014/coreit.v11i2.37929

Keywords:

Extractive Summarization, News Article, BERT, LexRank, TextRank

Abstract

Automatic text summarization is an effective solution to manage the vast amount of information in the digital age. This study aims to develop an extractive text summarization system for Indonesian news articles using sentence embeddings generated by IndoBERT and mBERT, combined with TextRank and LexRank algorithms for sentence ranking. The dataset used is Indonesian Text Summarization (IndoSum), which contains thousands of manually summarized articles. The research includes data collection, cleaning, preprocessing, embedding extraction, sentence similarity calculation, and ranking using graph-based methods. Model performance was evaluated using ROUGE and BERTScore. The results show that the combination of IndoBERT and LexRank achieved the highest performance with ROUGE-1 score 0.7018 and BERTscore 0.8696. The model was then implemented into a web prototype using Streamlit to allow users to summarize texts interactively. This study contributes to the advancement of automatic summarization technology for the Indonesian language.

References

R. Hadwirianto, “Extractive Text Summarization Terhadap Artikel Berita Indonesia Berbasis Machine Learning,” e-Proceeding Eng., vol. 11, no. 4, pp. 3941–3946, 2024.

R. Bhargava and Y. Sharma, “Deep Extractive Text Summarization,” Procedia Comput. Sci., vol. 167, no. 2019, pp. 138–146, 2020, doi: 10.1016/j.procs.2020.03.191.

A. P. Widyassari et al., “Review of automatic text summarization techniques & methods,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 4, pp. 1029–1046, 2022, doi: 10.1016/j.jksuci.2020.05.006.

G. W. Wicaksono, S. F. Al’asqalani, Y. Azhar, N. P. Hidayah, and Andreawan, “Automatic Summarization of Court Decision Documents Over Narcotic Cases Using BERT,” Int. J. Informatics Vis., vol. 7, no. 2, pp. 416–422, 2023, doi: 10.30630/joiv.7.2.1811.

K. Kurniawan and S. Louvan, “IndoSum: A New Benchmark Dataset for Indonesian Text Summarization,” Proc. 2018 Int. Conf. Asian Lang. Process. IALP 2018, pp. 215–220, 2018, doi: 10.1109/IALP.2018.8629109.

J. Wijaya and A. S. Girsang, “Indonesian News Extractive Summarization using Lexrank and YAKE Algorithm,” Stat. Optim. Inf. Comput., vol. 12, no. 6, pp. 1973–1983, 2024, doi: 10.19139/soic-2310-5070-1976.

I. Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N.; Kaiser, Lukasz; Polosukhin, “Attention is All You Need,” in Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, New York, NY, USA: ACM, Oct. 2023, pp. 4752–4758. doi: 10.1145/3583780.3615497.

J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” NAACL HLT 2019 - 2019 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf., vol. 1, no. Mlm, pp. 4171–4186, 2019.

E. Schlinger, “How multilingual is Multilingual BERT?,” 2019.

U. Khairani, V. Mutiawani, and H. Ahmadian, “Pengaruh Tahapan Preprocessing Terhadap Model Indobert Dan Indobertweet Untuk Mendeteksi Emosi Pada Komentar Akun Berita Instagram,” J. Teknol. Inf. dan Ilmu Komput., vol. 11, no. 4, pp. 887–894, 2024, doi: 10.25126/jtiik.1148315.

A. D. Widiantoro, Mustafid, and R. Sanjaya, Pengantar NLP Dan Topik Model LDA. 2024.

B. Yang, B. Zhang, K. Cutsforth, S. Yu, and X. Yu, “Emerging industry classification based on BERT model,” Inf. Syst., vol. 128, no. October 2024, 2025, doi: 10.1016/j.is.2024.102484.

M. Singh, A. K. Jakhar, and S. Pandey, “Sentiment analysis on the impact of coronavirus in social life using the BERT model,” Soc. Netw. Anal. Min., vol. 11, no. 1, pp. 1–11, 2021, doi: 10.1007/s13278-021-00737-z.

C. J. L. Tobing, IGN Lanang Wijayakusuma, and Luh Putu Ida Harini, “Perbandingan Kinerja IndoBERT dan MBERT Untuk Deteksi Berita Hoaks Politik dalam Bahasa Indonesia,” JST (Jurnal Sains dan Teknol., vol. 14, no. 1, pp. 114–123, 2025, doi: 10.23887/jstundiksha.v14i1.92126.

S. Bano, S. Khalid, N. M. Tairan, H. Shah, and H. A. Khattak, “Summarization of scholarly articles using BERT and BiGRU: Deep learning-based extractive approach,” J. King Saud Univ. - Comput. Inf. Sci., vol. 35, no. 9, p. 101739, 2023, doi: 10.1016/j.jksuci.2023.101739.

A. M. Abu Nada, E. Alajrami, A. A. Al-Saqqa, and S. S. Abu-Naser, “Arabic Text Summarization Using AraBERT Model Using Extractive Text Summarization Approach,” Int. J. Acad. Inf. Syst. Res., vol. 4, no. 8, pp. 6–8, 2020, [Online]. Available: www.ijeais.org/ijaisr

E. Yulianti, N. Pangestu, and M. A. Jiwanggi, “Enhanced TextRank using weighted word embedding for text summarization,” Int. J. Electr. Comput. Eng., vol. 13, no. 5, pp. 5472–5482, 2023, doi: 10.11591/ijece.v13i5.pp5472-5482.

T. Page, Lawrence; Brin, Sergey; Motwani, Rajeev; Winograd, “The PageRank Citation Ranking: Bringing Order to the Web,” Stanford InfoLab, 442, Nov. 1999. [Online]. Available: http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf

D. A. Purnanto, “Peringkasan Dokumen Berita Bahasa Indonesia Menggunakan Algoritma Genetika,” Universitas Brawijaya, 2012.

R. Mihalcea and P. Tarau, “TextRank: Bringing order into texts,” Proc. 2004 Conf. Empir. Methods Nat. Lang. Process. EMNLP 2004 - A Meet. SIGDAT, a Spec. Interes. Gr. ACL held conjunction with ACL 2004, vol. 85, pp. 404–411, 2004.

D. R. Radev, “LexRank: Graph-based Lexical Centrality as Salience in Text Summarization,” J. Artif. Intell. Res., vol. 22, pp. 457–479, 2004, doi: https://doi.org/10.1613/jair.1523.

M. Mohd, R. Jan, and M. Shah, “Text document summarization using word embedding,” Expert Syst. Appl., vol. 143, p. 112958, 2020, doi: 10.1016/j.eswa.2019.112958.

Downloads

Published

2025-12-30

Issue

Vol. 11 No. 2 (2025): December 2025

Section

Articles

License

The Authors submitting a manuscript do so on the understanding that if accepted for publication, copyright of the article shall be assigned to CoreIT journal and published by Informatics Engineering Department Universitas Islam Negeri Sultan Syarif Kasim Riau as publisher of the journal.

Authors who publish with this journal agree to the following terms:

Authors automatically transfer the copyright to the journal and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlike (CC BY SA) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.

Authors are able to enter into separate permission for non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.

Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).