Using the ALS dataset to study a rare but devastating progressive neurodegenerative disease, amyotrophic lateral sclerosis (ALS). Major clinically relevant questions include:
What patient phenotypes can be automatically and reliably identified and used to predict the change of the ALSFRS slope over time ?
Steps Implemented :
- Load and prepare the data Report (short!) data summaries and show some preliminary visualizations.
- Train a k-Means model on the data, select k.
- Evaluate the model performance using bar and silhouette plots and summarize the results.
- Tune and plot parameters with k-means++.
- Rerun the model with the optimal parameters and interpret the clustering results.
- Apply Hierarchical Clustering on three different linkages and compare the corresponding silhouette plots.
- Fit a Gaussian mixture model, select the optimal model, report BIC, and display density and classification plots.
- Compare the result of the above methods