Download Anaconda here.
(Note: there are a variety of different python distributions; for statistics and machine learning, we recommend Anaconda.)
Similar to R
, there are many open source Python packages for statistics and machine learning.
To download packages, two popular package managers are pip
and conda
. Both pip
and conda
come with the Anaconda distribution.
We recommend using virtual environments with Python. From this blog:
A Python virtual environment consists of two essential components: the Python interpreter that the virtual environment runs on and a folder containing third-party libraries installed in the virtual environment. These virtual environments are isolated from the other virtual environments, which means any changes on dependencies installed in a virtual environment don’t affect the dependencies of the other virtual environments or the system-wide libraries. Thus, we can create multiple virtual environments with different Python versions, plus different libraries or the same libraries in different versions.
We recommend creating a virtual environment for your MSDS-534 coding projects.
- Open Terminal
- Create an environment called
msds534
usingconda
with the command:conda create --name msds534
- To install packages in your environment, first activate your environment:
conda activate msds534
- Then, install the following packages using the command:
conda install numpy pandas matplotlib seaborn scikit-learn
- Install PyTorch by running the appropriate command from here (for macOS, the command is:
conda install pytorch::pytorch torchvision torchaudio -c pytorch
- To exit your environment:
conda deactivate
Here is a helpful cheatsheet for conda
environment commands.