Pattern-Based Stemmer Analysis and Implementation on Arabic Text

Ananda Wulandari, Kemas Rahmat S.W, Ade Romadhony

Abstract


Pattern-based Stemmer is an implementation of searching algorithm to find  stem from an Arabic word that implement morphological anlysis technique and affix removal technique. In this research, if stemming process has been done, word class determination process will be conducted according this way: First, system would match between word which is entered with the fix word that is stored in the system. If the word was not found, word class determination rules will be conducted based on prefix, suffix, and infix. If this system could not figure out the word class of the word from the second step, then word class would be determined based on the word position in a sentence.

Testing is commited in order to know the influences of the number of token, pattern and rule in the system to the system’s performance. Data that used in this testing are 37 surat in juz 30th from Al-Qur`an. They will be put into three categories, based on the number of rows of each surah : long surah, medium surah, and short surah. Based on the testing results, the best performance gained by storing more free-affix pattern, storing more word class determining rule, and adding affix elimination checking process into the system.

 

Keywords: Arabic text, stemming, stem, word class.

Full Text:

PDF

References


Agusta, Ledy. 2009. Perbandingan Algoritma Stemming Porter dengan Algoritma Nazief & Adriani untuk Stemming Dokumen Teks Bahasa Indonesia. Konferensi Nasional Sistem dan Informatika 2009. Tersedia di : http://yudiagusta.files.wordpress.com/2009/11/196-201-knsi09-036-perbandingan-algoritma-stemming-porter-dengan-algoritma-nazief-adriani-untuk-stemming-dokumen-teks-bahasa-indonesia.pdf diunduh pada tanggal 26 Maret 2010.

Alshalabi, Riyad. 2005. Pattern-based Stemmer for Finding Arabic Roots. Information Technology Journal 4 (1): 38-43, 2005 ISSN 1812-5638. Tersedia di : http://198.170.104.138/itj/2005/38-43.pdf diunduh pada tanggal 26 Maret 2010

Al-Atsary, Abu Hamzah Yusuf. 2007. Pengantar Mudah Belajar Bahasa Arab. Bandung : Pustaka Adhwa.

Al-Taani, Ahmad & Al-Rub, Salah. 2009. A Rule-Based Approach for Tagging Non-Vocalized Arabic Words. The International Arab Journal of Information Technology, Vol. 6, No. 3, July 2009. Tersedia di : http://www.ccis2k.org/iajit/PDF/vol.6,no.3/17.pdf diunduh pada tanggal 30 April 2011.

BBC - اكتشاف بروتين يساعد في التنبوء بالزهايمر. 2011. Tersedia di : http://www.bbc.co.uk/arabic/scienceandtech/2011/06/110623_alzheimer_research.shtml diakses pada tanggal 24 Juni 2011.

CIA - The World Factbook – Indonesia. 2010. Tersedia di: https://www.cia.gov/library/publications/the-world-factbook/geos/id.htm diakses pada tanggal 26 Maret 2010.

Dwiswistyan, Fiqi. 2009. Pengaruh Affix Removal dengan Porter Stemmer dan Krovetz Stemmer dalam Kategorisasi Berita Berbahasa Indonesia. Tugas Akhir Teknik Informatika Institut Teknologi Telkom.

Indradjaja, Lily Suryana. & Bressan, Stephane. Automatic Learning of Stemming Rules for the Indonesian Language. Tersedia di : http://www.aclweb.org/anthology-new/Y/Y03/Y03-1007.pdf diunduh pada tanggal 26 Maret 2010.

Larkey, Leah S., Esteros, Lisa Ba. & Conne, Margaret E. Improving Stemming for Arabic Information Retrieval :Light Stemming and Co-occurrence Analysis.

Muhyidin, Muhammad. 2007. Tuhfatus Saniyah. Tegal : Ash-Shaf Media.

Putri, Amelia Yosi. 2009. Stemming untuk Teks Berbahasa Indonesia dan Pengaruhnya dalam Kategorisasi. Tugas Akhir Teknik Informatika Institut Teknologi Telkom.

Purwantiningsih, Oky. 2005. Perangkat Lunak Kamus Berintelegensia untuk Bahasa Indonesia untuk Menentukan Kelas Kata Berdasarkan Kelas Akar Kata dan Imbuhan. Tugas Akhir Teknik Informatika Institut Teknologi Telkom

Waiyamai, Kitsana. Introduction to Text Mining. Dept of Computer Engineering, Faculty of Engineering, Kasetsart University, Bangkok, Thailand.

School of Computing & Communications. Stemming Performance. Tersedia di : http://www.comp.lancs.ac.uk/computing/research/stemming/general/perfomance.htm

Zaid, Bakr bin ‘Abdillah Abu. 1995. Hilyatu Thalibil ‘Ilmi. Arab Saudi : Darul ‘Ashimah.

Zakaria, Aceng. 2009. Al-Muyassar fii ‘Ilmi An-Nahwi. Garut : Ibnu Azka Press.

Zakaria, Aceng. 1996. Al-Kafi fii ‘Ilmi Ash-Sharfi I.

Zakaria, Aceng. 1997. Al-Kafi fii ‘Ilmi Ash-Sharfi II.


Refbacks

  • There are currently no refbacks.


FAKULTAS SAINS DAN TEKNOLOGI
UIN SUSKA RIAU

Kampus Raja Ali Haji
Gedung Fakultas Sains & Teknologi UIN Suska Riau
Jl.H.R.Soebrantas No.155 KM 18 Simpang Baru Panam, Pekanbaru 28293
Email: sntiki@uin-suska.ac.id