Alzheimer’s Disease Prediction using ANOVA with t-SNE Feature Selection Techniques and Ensemble Learning

Ferdib Al-Islam; Mostofa Shariar  Sanim; Kah Ong Michael  Goh; S M Hasan  Mahmud; Dip  Nandi

doi:10.37934/ard.144.1.123147

Authors

Ferdib Al-Islam Department of Computer Science and Engineering, Northern University of Business and Technology Khulna, Khulna, 9100, Bangladesh
Mostofa Shariar Sanim Department of Computer Science and Engineering, Northern University of Business and Technology Khulna, Khulna, 9100, Bangladesh
Kah Ong Michael Goh Faculty of Information Science & Technology (FIST), Multimedia University, Jalan Ayer Keroh Lama, 75450, Melaka, Malaysia
S M Hasan Mahmud Department of Computer Science, American International University-Bangladesh (AIUB), 408/1, Kuratoli, Khilkhet, Dhaka 1229, Bangladesh
Dip Nandi Department of Computer Science, American International University-Bangladesh (AIUB), 408/1, Kuratoli, Khilkhet, Dhaka 1229, Bangladesh

DOI:

https://doi.org/10.37934/ard.144.1.123147

Keywords:

Alzheimer’s Disease, Neurodegenerative Disorder, t-SNE, ANOVA, Machine Learning, SHAP

Abstract

Alzheimer’s disease stands as one of the most common neurodegenerative disorders, and currently, there is no cure for it. Early identification is pivotal for delaying disease continuation. The current approaches to Alzheimer's disease early detection rely on handwriting activities, which provide a significant quantity of data. Because of its great dimensionality, the final data obscures the significance of pertinent features. The challenge of dimensionality in data arises when there are too many features but not enough data samples, making it difficult for a model to discover a pattern in the data, affecting the many approaches used to diagnose or classify Alzheimer's disease. In this study, a way has been provided to overcome the curse of dimensionality by applying t-SNE and improve the efficacy of early Alzheimer's diagnosis by selecting key features using ANOVA; apart from that, seven machine learning algorithms have been used as base classifiers. These base classifiers were then used to create voting classifier results. The results of the studies indicate that the voting ensemble technique (approximately 94.28%) had the highest classification testing accuracy. Our approach has demonstrated its effectiveness by surpassing the latest benchmarks with our proposed technique. To comprehend how different features influence the model’s outcome, we utilized Explainable AI (XAI) techniques, specifically SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME). Our proposed method has the potential to significantly improve the accuracy of early Alzheimer’s disease diagnosis, laying the foundation for timely interventions and better patient outcomes.