Lifestyle-Based Early Detection of Diabetes: A Machine Learning Approach
Keywords:
Diabetes Detection, Lifestyle, XGBoost, Class Imbalance, Scale_Pos_Weight, Machine LearningAbstract
Early detection of diabetes based on lifestyle factors plays a vital role in preventing long-term complications. This study proposes a machine learning classification approach using an optimized XGBoost algorithm to identify diabetes status from ten lifestyle-related variables. The original dataset, consisting of three class labels, was simplified into two categories (Non-Diabetic and Diabetic) by merging classes, followed by a Euclidean distance analysis to compute the centroid gap. After feature selection and data scaling, two strategies were applied to address class imbalance: SMOTE and the scale_pos_weight parameter. Experimental results revealed that the scale_pos_weight method achieved superior performance, yielding an accuracy of 0.68 and a recall of 0.73 for the minority class. The model also reached a weighted F1-score of 0.73, indicating high sensitivity toward at-risk individuals. These findings highlight the effectiveness of combining lifestyle-based features and appropriate imbalance handling techniques for robust and reliable early detection of diabetes using machine learning.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Egga Asoka, Sulistiyanto Sulistiyanto, Sony Oktapriandi, Yulia Hapsari

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
