Stroke Prediction Using Smote for Data Balancing, XGBoost and KNN Ensemble Algorithms


Published: 2023-08-22

DOI: 10.56557/japsi/2023/v15i18349

Page: 42-53

Oladunjoye John Abiodun

Department of Computer Science, Federal University Wukari, Nigeria.

Andrew Ishaku Wreford *

Department of Computer Science, Federal University Wukari, Nigeria.

*Author to whom correspondence should be addressed.


Stroke is a pathological condition characterized by the rupture of blood vessels within the cerebral region, resulting in detrimental effects on the brain. The occurrence of stroke symptoms may arise when there is a disruption in the delivery of blood and essential nutrients to the brain. As per the World Health Organization (WHO), stroke is identified as the primary contributor to mortality and impairment on a worldwide scale. The early identification of stroke symptoms is of utmost importance as it provides vital information for predicting the likelihood of a stroke occurring and encourages the adoption of a healthy lifestyle. This study utilized two ensemble machine learning (ML) algorithms, namely KNN and XGBoost, which were combined in a stacked approach to create and evaluate the models. The primary goal was to establish a robust framework for predicting long-term stroke risk. Hence, the major contributions of this study are the introduction of data balancing techniques using smote algorithm and more importantly the stacking of the KNN and XGBoost algorithm, which exhibits high performance as validated by various metrics, including precision, recall, f-measure, and accuracy. Experimental results demonstrate that the stacked algorithm surpasses other applied ensemble methods, achieving an impressive accuracy of 97%, with a recall of 95% and 98%, precision of 98% and 95%, and an f1 score of 97%.

Keywords: XGBoost, KNN middlebrooks, vascular dementia, hypercholesterolemia

How to Cite

Abiodun , O. J., & Wreford , A. I. (2023). Stroke Prediction Using Smote for Data Balancing, XGBoost and KNN Ensemble Algorithms. Journal of Applied Physical Science International, 15(1), 42–53.


Download data is not yet available.


World health Organisation. Stroke, Cerebrovascular accident; 2022. Available:,cause%20is%20high%20blood%20pressure.

Cannistraro RJ, Badi M, Eidelman BH, Dickson DW, Middlebrooks EH, Meschia JF. CNS small vessel disease: a clinical review. Neurology. 2019;92(24):1146-56.

Emon MU, Keya MS, Meghla TI, Rahman MM, Al Mamun MS, Kaiser MS. Performance analysis of machine learning approaches in stroke prediction. In 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA). 2020;1464-1469. IEEE.

Sirsat MS, Fermé E, Camara J. Machine learning for brain stroke: a review. Journal of Stroke and Cerebrovascular Diseases. 2020;29(10):105162.

Gautam A, Raman B. Towards effective classification of brain hemorrhagic and ischemic stroke using CNN. Biomedical Signal Processing and Control. 2021;63:102178.

Cui Q. Modifiable and non-modifiable risk factors in ischemic stroke: a meta-analysis. African Health Sciences. 2019;19(2):2121-9.

Mainali S, Darsie ME, Smetana KS. Machine learning in action: stroke diagnosis and outcome prediction. Frontiers in Neurology. 2021;12:734345.

Cheon S, Kim J, Lim J. The use of deep learning to predict stroke patient mortality. International journal of environmental research and public health. 2019;16(11):1876.


Wu Y, Fang Y. Stroke prediction with machine learning methods among older Chinese. Int. J. Environ. Res. Public Health. 2020;17(6):1-11,

Pradeepa S, Manjula K, Vimal S, Khan MS, Chilamkurti N, Luhach AK DRFS. Detecting risk factor of stroke disease from social media using machine learning techniques. Neural Process. Lett. 2020;1–19.

Govindarajan P, Soundarapandian RK, Gandomi AH, Patan R, Jayaraman P, Manikandan R. Classification of stroke disease using machine learning algorithms. Neural Comput. Appl. 2020;32:817–828.

Zulfiker MS, Kabir N, Biswas AA, Chakraborty P, Rahman MM. Predicting students’ performance of the private universities of bangladesh using machine learning approaches. International Journal of Advanced Computer Science and Applications. 2020;11:3.