Perbandingan Algoritma Naive Bayes dan K-Nearest Neighbors dalam Analisis Sentimen Ulasan Aplikasi Gojek

Authors

  • Rousyati Rousyati Universitas Bina Sarana Informatika
  • Dany Pratmanto Universitas Bina Sarana Informatika
  • Fandhilah Fandhilah Universitas Bina Sarana Informatika

DOI:

https://doi.org/10.46880/jmika.Vol10No1.pp186-191

Keywords:

Sentiment Analysis, Naive Bayes, K-Nearest Neighbors, TF-IDF, PySastrawi, Gojek, Google Play Store

Abstract

Sentiment analysis of mobile app reviews helps understand public perceptions of digital service quality. This study compares two machine learning algorithms, Naive Bayes (NB) and K-Nearest Neighbors (KNN), for classifying sentiment in Gojek app reviews from the Google Play Store. The dataset includes 5,000 reviews (1-star as negative and 5-star as positive), processed through Indonesian text preprocessing steps: case folding, tokenization, stopword removal, stemming with PySastrawi, and TF-IDF feature extraction using unigrams and bigrams. After cleaning, 4,685 valid reviews remained, split into 80% training and 20% testing, producing 3,322 features. Results show that Naive Bayes (MultinomialNB, α = 1.0) outperforms KNN, achieving 89.43% accuracy, 90.09% precision, 89.43% recall, and 89.34% F1-score, with a 5-fold cross-validation score of 91.22%. Meanwhile, KNN (k = 7, cosine metric) achieves 86.77% accuracy, 86.78% precision, 86.77% recall, and 86.75% F1-score, with a cross-validation score of 87.83%. Overall, Naive Bayes proves more effective for high-dimensional Indonesian text classification using TF-IDF.

References

Andayani, M., Marisa, F., & Putra, R. P. (2024). Sentiment Analysis of Indonesia 2024 Election with a Comparison of Naive Bayes and KNN Algorithms on Twitter. SAR Journal, 7(3), 204–212. https://doi.org/10.18421/SAR73

Azhar, Masruroh, S. U., Wardhani, L. K., & Okfalisa. (2023). Performance comparison of the Naive Bayes algorithm and the k-NN lexicon approach on Twitter media sentiment analysis. Science, Technology, and Communication Journal, 3(2), 35–40. https://doi.org/10.59190/stc.v3i2.229

DataReportal. (2025). Digital 2026: Indonesia. https://datareportal.com/

Fields, J., Chovanec, K., & Madiraju, P. (2024). A survey of text classification with transformers: How wide? how large? how long? how accurate? how expensive? how safe?. IEEE Access, 12, 6518-6531.

GoTo Group. (2026). GoTo beats guidance, achieving record results as it reports 2025 fourth quarter and full year earnings. https://www.gotocompany.com/

Liu, B., & Cardie, C. (2014). Book Reviews Sentiment Analysis and Opinion Mining. https://doi.org/10.1162/COLI

Manning, C. D. (2008). Introduction to information retrieval. Syngress Publishing.

Pang, B., Lee, L., Rd, H., & Jose, S. (2002). Thumbs up? Sentiment Classification using Machine Learning Techniques. In Proceedings of the 2002 conference on empirical methods in natural language processing (EMNLP 2002) (pp. 79-86).

Pradana, A. W., & Hayaty, M. (2019). The effect of stemming and removal of stopwords on the accuracy of sentiment analysis on indonesian-language texts. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 4(3).

Turney, P. D. (2002). Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. (July), (pp. 417-424).

Published

2026-04-28

Issue

Section

METHOMIKA: Jurnal Manajemen Informatika & Komputersisasi Akuntansi