EKSTRAKSI KATA KUNCI OTOMATIS UNTUK DOKUMEN BAHASA INDONESIA STUDI KASUS: ARTIKEL JURNAL ILMIAH KOLEKSI PDII LIPI

Diana Permata Sari, Ayu Purwarianti

Abstract


Keyword determination by using controlled vocabulary is not a difficult task for information analysts. However,specify keywords for hundreds or even thousands of articles will take time and effort of the analysts. To ease thework, it needs to be made a system of automatic keyword extraction. The construction of this system passes thestages of preprocessing, translating, and pinpointing keyword candidates with a list of keywords. The research wascarried out by using 33 articles taken from PDII LIPI journal collections. This research employed 3 weighing methods,namely TF, TF x IDF and WIDF. The best result was obtained from TF x IDF method. To refine the result, the authorcarried out fixing the keywords results and using levensthein algorithm.

 


Keywords


Automatic indexing; Controlled vocabularies; Keywords searching

Full Text:

PDF

References


Gazendam, L., Waterna, C., Brussee, R. 2010. Thesaurus based term ranking for keyword extraction. 7th International Workshop on Text-based Information Retrieval.

Hartinah, S. 2002. Penggunaan Dalil Zipf pada pengindeksan otomatis. Kumpulan Makalah Kursus Bibliometrika. Depok: Masyarakat Informetrika Indonesia.

Hulth, A. 2003. Improved automatic keyword extraction given more linguistic knowledge. Proceedings of the 2003 Conference on Emprical Methods in Natural Language Processing. 216-223.

Kaur, K., Vishal, G. 2011. Keyword extraction for punjabi language. Indian Journal of Computer Science and Engineering (IJSCE). 2 (3), 364-370.

Manning, C. D., Raghavan, P., Scütze, H. 1999. An Introduction to Information retrieval. England: Cambridge University Press.

Pudota, N., Dattolo, A., Baruzzo, A., Tasso, C. 2010. A New Domain independent keyphrase extraction system. Italian Research Conference on Digital Library Management Systems - IRCDL.

Tokunaga, T., Iwayama, M. 1994. Text Categorization based on weighted inverse document frequency. Technical Report.

Wicaksono, A. F. and Purwarianti, A. 2010. HMM Based part-of-speech tagger for bahasa indonesia. Proceedings of the 4th- International MALINDO Workshop (MALINDO). Jakarta, Indonesia.




DOI: https://doi.org/10.14203/j.baca.v35i2.192



Copyright (c) 2014 BACA: Jurnal Dokumentasi dan Informasi

 

BACA: Jurnal Dokumentasi dan Informasi index by: