Perbandingan Metode K-Means dan DBSCAN untuk Clustering Teks Pesan Menfess di Telegram
DOI:
https://doi.org/10.46880/tamika.Vol6No1.pp20-29Keywords:
Text Clustering, K-Means, DBSCAN, Telegram, Menfess, Sentence-BERTAbstract
Menfess messages on Telegram are a form of anonymous communication that generates large amounts of informal text data with linguistic characteristics such as slang, abbreviations, and spelling variations, posing challenges for computational text analysis. This study compares the performance of K-Means and DBSCAN in clustering menfess messages using Sentence-BERT embedding through the paraphrase-multilingual-MiniLM-L12-v2 model across three text length scenarios: short (0–20 words), medium (55–128 words), and long (129–283 words), derived via word count clustering. Evaluation uses the Silhouette Score and Davies-Bouldin Index. For short texts, K-Means achieves 0.0804 and 3.8451, while DBSCAN produces 2 clusters with 0.3186, 1.3714, and 71.60% noise. For medium texts, K-Means obtains 0.1403 and 3.6490, while DBSCAN forms 1 cluster with 0.0450, 3.3405, and 74.60% noise. For long texts, K-Means obtains 0.0593 and 3.4552, while DBSCAN produces 2 clusters with 0.5492, 1.1069, and 79.67% noise. Results show that DBSCAN outperforms on short and long texts by evaluation metrics but produces very high noise across all scenarios, while K-Means demonstrates more stable performance by clustering all data without noise in every scenario.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Oktama Pangestu, Abdul Hamid Arribathi, Nur Azizah, Rahmat Hidayat

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.






