Perbandingan Metode K-Means dan DBSCAN untuk Clustering Teks Pesan Menfess di Telegram

Authors

  • Oktama Pangestu Universitas Raharja
  • Abdul Hamid Arribathi Universitas Raharja
  • Nur Azizah Universitas Raharja
  • Rahmat Hidayat Universitas Riau

DOI:

https://doi.org/10.46880/tamika.Vol6No1.pp20-29

Keywords:

Text Clustering, K-Means, DBSCAN, Telegram, Menfess, Sentence-BERT

Abstract

Menfess messages on Telegram are a form of anonymous communication that generates large amounts of informal text data with linguistic characteristics such as slang, abbreviations, and spelling variations, posing challenges for computational text analysis. This study compares the performance of K-Means and DBSCAN in clustering menfess messages using Sentence-BERT embedding through the paraphrase-multilingual-MiniLM-L12-v2 model across three text length scenarios: short (0–20 words), medium (55–128 words), and long (129–283 words), derived via word count clustering. Evaluation uses the Silhouette Score and Davies-Bouldin Index. For short texts, K-Means achieves 0.0804 and 3.8451, while DBSCAN produces 2 clusters with 0.3186, 1.3714, and 71.60% noise. For medium texts, K-Means obtains 0.1403 and 3.6490, while DBSCAN forms 1 cluster with 0.0450, 3.3405, and 74.60% noise. For long texts, K-Means obtains 0.0593 and 3.4552, while DBSCAN produces 2 clusters with 0.5492, 1.1069, and 79.67% noise. Results show that DBSCAN outperforms on short and long texts by evaluation metrics but produces very high noise across all scenarios, while K-Means demonstrates more stable performance by clustering all data without noise in every scenario.

Published

2026-05-17

Issue

Section

TAMIKA: Jurnal Tugas Akhir Manajemen Informatika & Komputerisasi Akuntansi