Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Latest commit

 

History

History
206 lines (135 loc) · 10 KB

environment.md

File metadata and controls

206 lines (135 loc) · 10 KB

Setup

Operating System

We recommend using our toolbox with Ubuntu 20.04 LTS. Most core InnerEye functionality will be stable on other operating systems, but PyTorch's full feature set is only available on Linux. All jobs in AzureML, both training and inference, run from an Ubuntu 20.04 Docker image. This means that using Ubuntu 20.04 locally allows for maximum reproducibility between your local and AzureML environments.

For Windows users, Ubuntu can be set up with Windows Subsystem for Linux (WSL). Please refer to the InneryEye WSL docs for more detailed instructions on getting WSL set up.

MacOS users can access an Ubuntu OS through VirtualBox.

Clone Repository

  1. Ensure you have Git CLI installed.

  2. Install Git Large File Storage (LFS):

    git lfs install
  3. Clone the repository:

    git clone --recursive https://github.com/microsoft/InnerEye-DeepLearning
    cd InnerEye-DeepLearning

To view and edit the InnerEye code, we recommended using the VSCode IDE.

Set Up Conda

Conda is an open source package management system. It is used in InnerEye to manage all python packages. Follow the instructions in this section to get it set up on your machine.

Prerequisite - Install build tools

In order to create the Conda environment you will need to have the appropriate build tools installed on your machine. To do this, run the commands relevant to your operating system from the subsections below.

Windows / MacOS Users

If you are running Windows or MacOS, build tools will automatically be installed with your Conda distribution and you can safely skip this step.

Ubuntu / Debian

sudo apt-get install build-essential

CentOS / RHEL

yum install gcc gcc-c++ kernel-devel make

Install conda

Check if you already have Conda installed by running conda --version in your shell. If you see an error such as "command not found", you will need to install Conda for your operating system using one of the following options:

  • Install Miniconda - this is the simplest and most lightweight option, and is sufficient for most use-cases.
  • Install Conda - more extensive package management features and system-wide installation.

Create a Conda Environment

There are two important files in this repo for creating Conda environments:

  • primary_deps.yml - This file contains the list of primary package dependencies, and can be used to create an environment on any OS.
  • environment.yml - DO NOT EDIT THIS FILE MANUALLY. This file is a lockfile - it contains a locked list of primary and secondary dependencies that are used to create the environments for AzureML jobs and local Ubuntu environments. It contains Ubuntu-specific platform dependencies and cannot be used to create environments on other operating systems.

Create environment from lockfile (Ubuntu / WSL only)

To create an environment from the lockfile, run the following command:

conda env create --file environment.yml
conda activate InnerEye

Create non-locked environment (Windows / MacOS / all other operating systems)

For all other operating systems, no locked environment is provided. Instead, a new Conda environment can be created from the primary dependencies using the following commands:

conda env create --file primary_deps.yml
conda activate InnerEye

Reproducibility between local and AzureML runs is NOT guaranteed for environments created using this method. For maximum reproducibility please consider using Ubuntu 20.04 as per our operating system instructions.

Upgrade / Add Python packages in environment

If you wish to alter the packages in your local Conda environment, this can be done by editing the primary_deps.yml file with your desired changes and then following the instructions relevant to your OS given in the subsections below.

If you want to change versions of packages used in the AzureML environment, this can only be done from an Ubuntu machine, and is facilited through the provided script create_and_lock_environment.sh, instructions for which are given in the Ubuntu subsection below.

Ubuntu 20.04 Users

  1. Make your desired changes in primary_deps.yml. Make sure your package names and versions are correct.

  2. To create a new environment and a valid environment.yml, run the following command:

    bash -i create_and_lock_environment.sh

This script will create/update your local Conda environment with your desired primary package versions, as well as a new environment.yml which can be ingested by AzureML to create a copy of your local environment.

All other operating systems

  1. Make your desired changes in primary_deps.yml. Make sure your package names and versions are correct.

  2. If you have already created the environment previously, run:

    conda env update --file primary_deps.yml --prune
  3. Otherwise, run:

    conda env create --file primary_deps.yml

Using GPU locally

It is possible to run the training process on a local machine. It will not be as performant as using a GPU cluster that Azure ML offers and you will not be able to take advantage of other Azure ML features such as comparing run results, creating snapshots for repeatable machine learning experiments or keeping history of experiment runs. At the same time it could be useful to experiment with code or troubleshoot things locally.

The SDK uses PyTorch to compose and run DNN computations. PyTorch can leverage the underlying GPU via NVIDIA CUDA technology, which accelerates computations dramatically.

In order to enable PyTorch to use CUDA, you need to make sure that you have

  1. Compatible graphics card with CUDA compute capability of at least 3.0 (at the moment of writing). You can check the compatibility list on the NVIDA Developer site
  2. Recent NVIDIA drivers installed

A quick way to check if PyTorch can use the underlying GPU for computation is to run the following line from your Conda environment with all InnerEye packages installed: python -c 'import torch; print(torch.cuda.is_available())' It will output True if CUDA computation is available and False if it's not.

Some tips for installing NVIDIA drivers below:

Windows

You can download NVIDIA drivers for your graphics card the NVIDIA website as a Windows .exe file and install them this way.

WSL

Microsoft provides GPU support via WSL starting WSL 2.0.

You can find more details on WSL in our separate WSL section.

Linux

The exact instructions for driver installation will differ depending on the Linux distribution. Generally, you should first run the nvidia-smi tool to see if you have NVIDIA drivers installed. This tool is installed together with NVIDIA drivers and if your system can not find it, it may mean that the drivers are not installed. A sample output of NVIDIA SMI tool may look like this:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 0000027F:00:00.0 Off |                    0 |
| N/A   50C    P0    60W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

In this case we can see that the system has access to a Tesla K80 GPU and is running driver version 450.51.06

If the driver is not available, you can try the following to install:

Ubuntu

  1. Run ubuntu-drivers devices to see what drivers are available (you may need to install the tool via sudo apt-get install ubuntu-drivers-common and update the package database via sudo apt update). You should see an output like this:

    ...
    vendor   : NVIDIA Corporation
    model    : GK210GL [Tesla K80]
    driver   : nvidia-driver-450-server - distro non-free recommended
    driver   : nvidia-driver-418-server - distro non-free
    driver   : nvidia-driver-440-server - distro non-free
    driver   : nvidia-driver-435 - distro non-free
    driver   : nvidia-driver-450 - distro non-free
    driver   : nvidia-driver-390 - distro non-free
    driver   : xserver-xorg-video-nouveau - distro free builtin
    
  2. Run sudo apt install nvidia-driver-450-server (or whichever is the recommended in your case)

  3. Reboot your system

At this point you should be able to run the nvidia-smi tool and PyTorch should be able to communicate with the GPU

CentOS/RHEL

  1. Add NVIDIA repository to your config manager sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo (if you are running RHEL8, otherwise you can get the URL for your repo on the NVIDIA dev site)
  2. Clean repository cache via sudo dnf clean all
  3. Install drivers sudo dnf -y module install nvidia-driver:latest-dkms
  4. Reboot your system

At this point you should be able to run the nvidia-smi tool and PyTorch should be able to communicate with the GPU

You can find instructions for other Linux distributions on the NVIDIA website