Optimalisasi Splitting Data untuk Kinerja Robust Model EfficientNetV2-B0 pada Deteksi Pneumonia

Authors

  • Syanti Irviantina Universitas Mikroskil
  • M. Daffa Rizaldi Siregar Universitas Mikroskil

DOI:

https://doi.org/10.46880/methoda.Vol16No2.pp83-89

Keywords:

Data Splitting, EfficientNetV2-B0, Pneumonia Detection, Stratified Cross-Validation, Deep Learning

Abstract

The splitting of datasets constitutes a fundamental yet frequently overlooked methodological decision in deep learning research for medical image classification. This study investigates the impact of various data splitting scenarios on the robust performance of the EfficientNetV2-B0 model in pneumonia detection using chest X-ray images. Using the Kaggle Chest X-ray Pneumonia dataset, seven experimental scenarios were designed encompassing differences in train-validation-test allocation ratios (70/15/15, 70/10/20, 80/10/10, 85/15, 70/30), partition strategies (stratified vs. random), and validation methods (holdout vs. 5-fold stratified cross-validation). The results demonstrate that 5-fold stratified cross-validation produces the most stable performance estimates with the lowest variance (Accuracy: 97.4%±0.3%, AUC: 0.993±0.002), whereas random partition without stratification yields significantly inferior results (Accuracy: 95.1%, AUC: 0.973). Among the holdout scenarios, the 70/15/15 stratified ratio achieved the best performance (Accuracy: 97.2%, AUC: 0.991). Statistical analysis confirms significant differences between stratified and non-stratified scenarios (p < 0.05). These findings provide empirical guidance for researchers in designing more valid and replicable machine learning experiments in the medical domain. 

Published

2026-05-30

Issue

Section

Majalah Ilmiah METHODA