Lifestyle-Based Early Detection of Diabetes: A Machine Learning Approach

Authors

  • Egga Asoka Sriwijaya State Polytechnic
  • Sulistiyanto Sulistiyanto Sriwijaya State Polytechnic
  • Sony Oktapriandi Sriwijaya State Polytechnic
  • Yulia Hapsari Sriwijaya State Polytechnic

Keywords:

Diabetes Detection, Lifestyle, XGBoost, Class Imbalance, Scale_Pos_Weight, Machine Learning

Abstract

Early detection of diabetes based on lifestyle factors plays a vital role in preventing long-term complications. This study proposes a machine learning classification approach using an optimized XGBoost algorithm to identify diabetes status from ten lifestyle-related variables. The original dataset, consisting of three class labels, was simplified into two categories (Non-Diabetic and Diabetic) by merging classes, followed by a Euclidean distance analysis to compute the centroid gap. After feature selection and data scaling, two strategies were applied to address class imbalance: SMOTE and the scale_pos_weight parameter. Experimental results revealed that the scale_pos_weight method achieved superior performance, yielding an accuracy of 0.68 and a recall of 0.73 for the minority class. The model also reached a weighted F1-score of 0.73, indicating high sensitivity toward at-risk individuals. These findings highlight the effectiveness of combining lifestyle-based features and appropriate imbalance handling techniques for robust and reliable early detection of diabetes using machine learning.

Downloads

Published

2025-08-05