Clustering samples correctly requires accurate measures of similarity. Depending on your data type, we will explore several ways to compute similarity (or distance). Then, we will use hierarchical clustering to group data points. At the end, you will make a clustering dendrogram.
Please have a way to run a Jupyter notebook.
One option is to use Google colab. If you have access, then you are done with the pre-work!
Another option is to run the Jupyter notebook locally. Please install Anaconda or conda. The python packages used in this demonstration are: numpy, scipy, matplotlib, and jupyter.
You can also create a new environment named "dataanalysis" with the necessary python packages. In your terminal:
conda create -n dataanalysis python=3.10 numpy scipy matplotlib jupyter
The terminal will show the creation of an environment, including downloading these python packages. For more detailed information about conda environments.
To enter the environment:
conda activate dataanalysis
To exit the environment:
conda deactivate dataanalysis
Make a copy of the linked colab notebook so that you can edit!
Please enter the environment and launch jupyter notebook. In the terminal:
conda activate dataanalysis
jupyter notebook
https://docs.scipy.org/doc/scipy/reference/spatial.distance.html