Stroke Prediction Using Smote for Data Balancing, XGBoost and KNN Ensemble Algorithms
Oladunjoye John Abiodun
Department of Computer Science, Federal University Wukari, Nigeria.
Andrew Ishaku Wreford *
Department of Computer Science, Federal University Wukari, Nigeria.
*Author to whom correspondence should be addressed.
Abstract
Stroke is a pathological condition characterized by the rupture of blood vessels within the cerebral region, resulting in detrimental effects on the brain. The occurrence of stroke symptoms may arise when there is a disruption in the delivery of blood and essential nutrients to the brain. As per the World Health Organization (WHO), stroke is identified as the primary contributor to mortality and impairment on a worldwide scale. The early identification of stroke symptoms is of utmost importance as it provides vital information for predicting the likelihood of a stroke occurring and encourages the adoption of a healthy lifestyle. This study utilized two ensemble machine learning (ML) algorithms, namely KNN and XGBoost, which were combined in a stacked approach to create and evaluate the models. The primary goal was to establish a robust framework for predicting long-term stroke risk. Hence, the major contributions of this study are the introduction of data balancing techniques using smote algorithm and more importantly the stacking of the KNN and XGBoost algorithm, which exhibits high performance as validated by various metrics, including precision, recall, f-measure, and accuracy. Experimental results demonstrate that the stacked algorithm surpasses other applied ensemble methods, achieving an impressive accuracy of 97%, with a recall of 95% and 98%, precision of 98% and 95%, and an f1 score of 97%.
Keywords: XGBoost, KNN middlebrooks, vascular dementia, hypercholesterolemia