Analisis Efektivitas Algoritma Cosine Similarity dan Boyer-Moore dalam Sistem Pencarian Dokumen Digital

Authors

  • Wawan Ade Saputra Politeknik Negeri Bengkalis
  • Lidya Wati POLITEKNIK NEGERI BENGKALIS

DOI:

https://doi.org/10.46880/jmika.Vol10No1.pp269-277

Keywords:

Cosine Similarity, Boyer-Moore, Information Retrieval, TF-IDF, Precision, Recall

Abstract

This study analyzes the effectiveness of the cosine similarity and Boyer-Moore algorithms in digital document retrieval within the SIPANDOK system developed for the Bengkalis Regency Health Office. Cosine similarity measures semantic document relevance via TF-IDF vector weighting, while Boyer-Moore performs direct pattern matching through string heuristics. The system was built using the Rapid Application Development (RAD) methodology and evaluated against 56 documents and 55 test queries using precision, recall, F1-score, accuracy, and execution time metrics. Results indicate that Boyer-Moore achieves higher average recall (66.7%) and F1-score (33.3%), demonstrating superiority in retrieving relevant documents, whereas cosine similarity offers faster execution time (average 0.31 seconds) compared to Boyer-Moore (0.91 seconds). Each algorithm presents distinct advantages depending on whether precision-orientation or recall-orientation is prioritized in document retrieval scenarios.

References

Ahmad, I., Borman, R. I., Caksana, G. G., & Fakhrurozi, J. (2022). Implementasi String Matching dengan Algoritma Boyer-Moore untuk Menentukan Tingkat Kemiripan pada Pengajuan Judul Skripsi/TA Mahasiswa. SINTECH Journal.

Al Rasyid, R., Handayani, D., & Ningsih, U. (2024). Penerapan Algoritma TF-IDF dan Cosine Similarity untuk Query Pencarian pada Dataset Destinasi Wisata. Jurnal Teknologi Informasi dan Komunikasi, 8(1).

Cahyani, A. D., Fathoni, M. W., Rachman, F. H., Basuki, A., Amin, S., & Khotimah, B. K. (2025). Automatic essay scoring: leveraging Jaccard coefficient and Cosine similarity with n-gram variation. IAES International Journal of Artificial Intelligence, 14(5), 3599–3612. https://doi.org/10.11591/ijai.v14.i5.pp3599-3612

Erickson, J. (2018). Algorithms Lecture: String Matching. University of Illinois at Urbana-Champaign.

Fadhullah, A. N. (2022). Aplikasi Deteksi Dini Plagiarism Penelitian Ilmiah Menggunakan Algoritma Cosine Similarity Berbasis Web. Jurnal Teknologi Informasi dan Komunikasi, 6(3).

Faqih, Y., Rahmanto, Y., Aldino, A. A., & Waluyo, B. (2022). Penerapan String Matching Menggunakan Algoritma Boyer-Moore pada Pengembangan Sistem Pencarian Buku Online. Bulletin of Computer Science Research, 2(3), 100–106. https://doi.org/10.47065/bulletincsr.v2i3.172

Fifuadi, S., Gutama, H. D., Pramuntadi, A., & Wijaya, P. W. (2024). Implementasi Algoritma String Matching Boyer-Moore untuk Pencarian Nama Dokumen pada Sistem Pengarsipan Dokumen. Majalah Ilmiah UNIKOM, 22(1), 19–28. https://doi.org/10.34010/miu.v22i1.13383

Firmansyah, F., Fauziah, & Hayati, N. (2022). Analisis Perbandingan dan Implementasi String Matching dan SQL Query pada Sistem Informasi Persediaan Obat Berbasis Web Apotek Erha Farma. Jurnal Ilmiah Teknologi dan Rekayasa, 27(2), 154–168. https://doi.org/10.35760/tr.2022.v27i2.7079

Formal, T., Piwowarski, B., & Clinchant, S. (2021). SPLADE: Sparse lexical and expansion model for first stage ranking. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’21) (pp. 2288–2292). ACM. https://doi.org/10.1145/3404835.3463098

Gou, M. (2014). Algorithms for String Matching. Working Paper, July 2014.

Iskandar, D., & Kurniawati, A. (2025). Analisis Perbandingan Teknik Word2vec dan Doc2vec dalam Mengukur Kemiripan Dokumen Menggunakan Cosine Similarity. Jurnal Teknologi Informasi dan Ilmu Komputer, 12(1), 133–144. https://doi.org/10.25126/jtiik.2025129143

Karimah, M., & Zein, A. (2024). Penerapan Algoritma Boyer-Moore Sebagai Pra-Proses Identifikasi DNA Forensik. SAINSTECH, 34(3), 57–62. https://doi.org/10.37277/stch.v34i3.2109

Karpukhin, V., Oğuz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., & Yih, W. (2020). Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 6769–6781). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.550

Kasim, R. J., & Utami, E. (2024). Penerapan Algoritma Boyer Moore yang Dimodifikasi untuk Stemmer Bahasa Indonesia. JIPI, 9(3), 1657–1667. https://doi.org/10.29100/jipi.v9i3.5449

Lin, S.-C., & Lin, J. (2022). A few brief notes on DeepImpact, COIL, and a conceptual framework for information retrieval techniques. arXiv preprint. https://doi.org/10.48550/arXiv.2106.14807

Manning, D. C., Raghavan, P., & Schütze, H. (2009). An Introduction to Information Retrieval. Cambridge University Press.

Rinjeni, T. P., Indriawan, A., & Rakhmawati, N. A. (2024). Matching Scientific Article Titles using Cosine Similarity and Jaccard Similarity Algorithm. Procedia Computer Science (Elsevier), 553–560. https://doi.org/10.1016/j.procs.2024.03.039

Robertson, S. E., & Zaragoza, H. (2009). The Probabilistic Relevance Framework: BM25 and Beyond. Foundations and Trends in Information Retrieval, 3(4), 333–389. https://doi.org/10.1561/1500000019

Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513–523. https://doi.org/10.1016/0306-4573(88)90021-0

Setiawan, A. B., Mahdiyah, U., Farida, I. N., & Prasetyo, A. R. (2023). Pengukuran Kemiripan Makna Menggunakan Cosine Similarity dan Basis Data Sinonim Kata. Jurnal Teknologi Informasi dan Ilmu Komputer, 10(4). https://doi.org/10.25126/jtiik.2023106864

Thakur, N., Reimers, N., Rücklé, A., Srivastava, A., & Gurevych, I. (2021). BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS 2021). https://doi.org/10.48550/arXiv.2104.08663

Wahyuni, S., & Abdullah, A. (2025). Penggunaan Information Retrieval untuk Mendeteksi Kesamaan Judul Skripsi dengan Modified Cosine Similarity. Jurnal JUSTEK, 8(2), 117–126. https://doi.org/10.31764/justek.v8i2.117-126

Yates, B. R., & Neto, R. B. (1999). Modern Information Retrieval. Addison-Wesley Longman Limited.

Published

2026-06-11

Issue

Section

METHOMIKA: Jurnal Manajemen Informatika & Komputersisasi Akuntansi