Collaboration of Nazief & Adriani Stemming Algorithm with PostgreSQL Queries Parsing Method to Search for New Study Program Names

Indra Chaidir

doi:10.24114/cess.v8i2.48212

Authors

Indra Chaidir Universitas Bina Sarana Informatika

DOI:

https://doi.org/10.24114/cess.v8i2.48212

Keywords:

Natural Language Processing, Stemming, Parsing Queries, Full Text Search, PostgreSQL, Program Studi Baru, Nomenklatur

Abstract

Penolakan usulan nama baru program studi vokasi pada Aplikasi Silemkerma di Direktorat Jenderal Pendidikan Tinggi Vokasi, Kementerian Pendidikan, Kebudayaan, Riset, dan Teknologi sering terjadi karena terdapat kemiripan nama program studi yang diusulkan dengan nama program studi yang sudah ada di dalam basis data. Banyak data tidak ditemukan karena filter data menggunakan metode konvensional dalam kasus ini menggunakan operator ILIKE dengan pola wildcard character % (percent), sedangkan data yang dicari tersedia di dalam basis data. Ini terjadi dikarenakan operator ILIKE tidak dapat membaca perubahan kata dari leksem/akar kata (root word) seperti "pengelolaan" dengan memiliki prefix dan suffix, dengan akar kata "kelola". Mengatasi permasalahan ini, penulis memanfaatkan Algoritma Nazief & Adriani untuk stemming agar mendapatkan leksem dari kalimat yang dimasukan. Hasil algoritma tersebut terus diolah menggunakan Metode Parsing Queries, salah satu metode Full Text Search yang ada pada basis data PostgresQL. Hasil penelitian ini dapat diimplementasikan pada Aplikasi tersebut.Rejection of new vocational study program name proposals in Silemkerma Application at the Directorate General of Vocational Higher Education, Ministry of Education, Culture, Research, and Technology often occurs because there is a similarity between the proposed study program name and the existing study program name in the database. Many data are not found because the data filter uses conventional methods in this case using the ILIKE operator with the wildcard character pattern % (percent), while the data sought is available in the database. This is because the ILIKE operator cannot read word changes from lexemes/root words such as "pengelolaan" which has a prefix and suffix, with the root word "kelola". Overcoming this problem, the author utilizes the Nazief & Adriani Algorithm for stemming in order to get lexemes from the sentences entered. The results of the algorithm are then processed using the Parsing Queries Method, one of the Full Text Search methods available in the PostgresQL database. The results of this research can be implemented in the application.

References

A. Gelbulkh, œNatural Language Processing, in Fifth International Conference on Hybrid Intelligent Systems (HIS™05), Rio de Janeiro, Brazil, 2005, p. 6. doi: 10.1109/ICHIS.2005.79.

F. Z. Tala, œA Study of Stemming Effects on Information Retrieval in Bahasa Indonesia, M.Sc. Thesis, Append. D, vol. pp, pp. 39“46, 2003.

K. Divya, B. S. Siddhartha, N. M. Niveditha, and B. M. Divya, œAn Interpretation of Lemmatization and Stemming in Natural Language Processing, J. Univ. Shanghai Sci. Technol., vol. 22, no. 10, p. 351, 2020, [Online]. Available: https://www.researchgate.net/publication/348306833

J. Asian, H. E. Williams, and S. M. M. Tahaghoghi, œStemming Indonesian: A Confi x-Stripping Approach, Conf. Res. Pract. Inf. Technol. Ser., vol. 38, no. September 2018, pp. 307“314, 2005, doi: 10.1145/1316457.1316459.

D. Wahyudi, T. Susyanto, and D. Nugroho, œImplementasi Dan Analisis Algoritma Stemming Nazief & Adriani Dan Porter Pada Dokumen Berbahasa Indonesia, J. Ilm. SINUS, vol. 15, no. 2, pp. 49“56, 2017, doi: 10.30646/sinus.v15i2.305.

S. Suhada and S. Bahri, œImplementasi Algoritma Rabin Karp Dan Stemming Najief Andriani Untuk Deteksi Plagiarisme Dokumen, Swabumi, vol. 5, no. 1, pp. 84“89, 2017, [Online]. Available: https://ejournal.bsi.ac.id/ejurnal/index.php/swabumi/article/view/1776

A. C. Herlingga, I. P. E. Prismana, D. R. Prehanto, and D. A. Dermawan, œAlgoritma Stemming Nazief & Adriani dengan Metode Cosine Similarity untuk Chatbot Telegram Terintegrasi dengan E-layanan, J. Informatics Comput. Sci., vol. 2, no. 01, pp. 19“26, 2020, doi: 10.26740/jinacs.v2n01.p19-26.

A. Jelita, œEffective Techniques for Indonesian Text Retrieval, Ph.D Thesis, pp. 1“286, 2007, [Online]. Available: https://researchbank.rmit.edu.au/view/rmit:6312

A. Z. Arifin, P. Adhi, K. Mahendra, and H. T. Ciptaningtyas, œEnhanced Confix-Stripping Stemmer and Ants Algorithm for Classifying News Document in Indonesian Language, 5th Int. Conf. Inf. Commun. Technol. Syst., no. April 2014, pp. 149“158, 2009.

A. D. Tahitoe and D. Purwitasari, œEnhanced Confix Stripping Stemmer, pp. 1“15, 2010.

PostgreSQL, œPostgreSQL Documentation 15, Chapter 12, ˜Full Text Search,™ 2022. https://www.postgresql.org/docs/current/textsearch-intro.html