Perbandingan Metode K-Means dan DBSCAN untuk Clustering Teks Pesan Menfess di Telegram

Oktama Pangestu; Abdul Hamid Arribathi; Nur Azizah; Rahmat Hidayat

doi:10.46880/tamika.Vol6No1.pp20-29

Authors

Oktama Pangestu Universitas Raharja
Abdul Hamid Arribathi Universitas Raharja
Nur Azizah Universitas Raharja
Rahmat Hidayat Universitas Riau

DOI:

https://doi.org/10.46880/tamika.Vol6No1.pp20-29

Keywords:

Text Clustering, K-Means, DBSCAN, Telegram, Menfess, Sentence-BERT

Abstract

Menfess messages on Telegram are a form of anonymous communication that generates large amounts of informal text data with linguistic characteristics such as slang, abbreviations, and spelling variations, posing challenges for computational text analysis. This study compares the performance of K-Means and DBSCAN in clustering menfess messages using Sentence-BERT embedding through the paraphrase-multilingual-MiniLM-L12-v2 model across three text length scenarios: short (0–20 words), medium (55–128 words), and long (129–283 words), derived via word count clustering. Evaluation uses the Silhouette Score and Davies-Bouldin Index. For short texts, K-Means achieves 0.0804 and 3.8451, while DBSCAN produces 2 clusters with 0.3186, 1.3714, and 71.60% noise. For medium texts, K-Means obtains 0.1403 and 3.6490, while DBSCAN forms 1 cluster with 0.0450, 3.3405, and 74.60% noise. For long texts, K-Means obtains 0.0593 and 3.4552, while DBSCAN produces 2 clusters with 0.5492, 1.1069, and 79.67% noise. Results show that DBSCAN outperforms on short and long texts by evaluation metrics but produces very high noise across all scenarios, while K-Means demonstrates more stable performance by clustering all data without noise in every scenario.

Perbandingan Metode K-Means dan DBSCAN untuk Clustering Teks Pesan Menfess di Telegram

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

akreditasi

menu

templete

tools

indexing

pengunjung