This repository contains the code and data analysis pipeline for investigating the effect of feature selection on the classification of Alzheimer's Disease (AD), Mild Cognitive Impairment (MCI), and Normal Cognition (NC) using Support Vector Machines (SVM). The study focuses on leveraging regional cortical and subcortical SUVR (Standard Uptake Value Ratio) and volume features extracted from PET-MRI data.
Early and accurate diagnosis of Alzheimer's Disease is critical for effective intervention. This project aims to evaluate the role of feature selection in improving the performance of machine learning models for differentiating between:
- AD vs. MCI
- MCI vs. NC
We apply statistical tests for feature selection and use SVM classifiers to assess the classification performance.
The dataset includes:
- Regional SUVR: From human brain FDG-PET images
- Volume Features: From human brain MRI T1-W images
This project uses data obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. ADNI is a longitudinal multicenter study designed to develop clinical, imaging, genetic, and biochemical biomarkers for the early detection and tracking of Alzheimer’s disease.
All the PET and MRI images were pre-processed using the SPM12 software. The pipeline included:
- Co-registeration of PET to T1 space
- Normalization to the MNI standard space
- Gray matter segmentation
Python was used for:
- Segmentation of 115 ROIs (according to the Harward-Oxford atlas)
- Calculating the average SUVR in every ROI
- Calculating the cerebral volume in every ROI
Data manipulation consisted of:
- Outlier handling: outliers were replaced by the group median
- Standard scaling: to remove bias from the dataset
Statistical tests such as Levene's test (for variance equality) and two-sample t-tests were used to identify significant regions of interest (ROIs) based on corrected p-values. These selected features were then used in the SVM model.
Support Vector Machine (SVM) was implemented with:
- Grid search for hyperparameter optimization.
- RobustScaler and StandardScaler for preprocessing.
- Performance metrics: Accuracy, F1-score, Recall, and Confusion Matrix.
- Plots of SUVR and volume distributions across groups.
- Boxplots comparing SUVR and volume by group.
- Region-wise mean and standard deviation visualizations.
.
├── Features-2.xlsx # Input feature dataset
├── README.md # Project documentation
├── feature_selection.py # Statistical feature selection pipeline
├── classification.py # SVM implementation for classification
├── plots.py # Visualization scripts
└── utils.py # Utility functions for preprocessing and analysis
- Setup: Ensure you have Python 3.8+ and the required libraries installed:
pip install -r requirements.txt
- Prepare Data: Place the dataset (
Features-2.xlsx
) in the root directory. - Run Feature Selection:
python feature_selection.py
- Train Classifier:
python classification.py
- Visualize Results:
python plots.py
- Significant ROIs were identified for both SUVR and volume features across AD vs. MCI and MCI vs. NC groups.
- Performance metrics for AD vs. MCI and MCI vs. NC:
- Accuracy:
XX%
- F1-Score:
XX%
- Recall:
XX%
- Accuracy:
- ROI-wise differences in SUVR and volume for AD, MCI, and NC groups.
- Impact of feature selection on SVM classification performance.
This project is licensed under Apache License.
The dataset used in this study is sourced from the ADNI database. Special thanks to the research teams contributing to Alzheimer's Disease diagnostics and PET-MRI advancements.