TEXT MINING DAN KLASIFIKASI MULTI LABEL MENGGUNAKAN XGBOOST
Keywords:Extreme Gradient Boosting, Logistic Regression, Multi-Label Classification
The conventional classification process is applied to find a single criterion or label. The multi-label classification process is more complex because a large number of labels results in more classes. Another aspect that must be considered in multi-label classification is the existence of mutual dependencies between data labels. In traditional binary classification, classification analysis only aims to determine the label in the text, whether positive or negative. This method is sub-optimal because the relationship between labels cannot be determined. To overcome the weaknesses of these traditional methods, multi-label classification is one of the solutions in data labeling. With multi-label text classification, it allows the existence of many labels in a document and there is a semantic correlation between these labels. This research performs multi-label classification on research article texts using the ensemble classifier approach, namely XGBoost. Classification performance evaluation is based on several metrics criteria of confusion matrix, accuracy, and f1 score. Model evaluation is also carried out by comparing the performance of XGBoost with Logistic Regression. The results of the study using the train test split and cross-validation obtained an average accuracy of training and testing for Regression Logistics of 0.81, and an average f1 score of 0.47. The average accuracy for XGBoost is 0.88, and the average f1 score is 0.78. The results show that the XGBoost classifier model can be applied to produce a good classification performance.