EVALUASI PERBANDINGAN KINERJA MODEL MACHINE LEARNING UNTUK PREDIKSI DIABETES: STUDI KASUS XGBOOST, RANDOM FOREST, DAN SVM
Abstract
This study evaluates and compares the performance of three major machine learning (ML) models—XGBoost, Random Forest, and Support Vector Machine (SVM)—for diabetes risk prediction using the Pima Indians Diabetes Dataset. The core problem addressed is the need for accurate and effective early detection to mitigate serious complications such as cardiovascular disease and kidney failure. The proposed solution involves training and evaluating these models on a pre-processed dataset, using metrics like accuracy, precision, recall, F1-score, and Area Under the Curve (AUC) on the ROC Curve. Random Forest achieved the best performance, showing the highest accuracy (0.76) and AUC (0.82). Furthermore, Random Forest was superior in detecting positive cases (diabetes), as evidenced by the confusion matrix analysis, which is critical in a medical context. Glucose and BMI were identified as the most crucial features for prediction across the models. The key finding is that Random Forest is the most effective and stable model, providing better discriminative abilities for clinical decision support in early diabetes risk prediction.
Downloads
Copyright (c) 2024 INFOKOM (Informatika & Komputer)

This work is licensed under a Creative Commons Attribution 4.0 International License.




