This repo contains scripts to configure Ubuntu 22.04 for running and developing AI applications.
The setup installs Docker
, nvidia-driver
, and NVIDIA Container Toolkit
to run applications in isolated containers. Dependency management is simplified by having each application specify the versions of python, pytorch, CUDA, or other system packages it needs.
Run ./setup.sh
on the Ubuntu machine you want configured.
You will need to reboot the machine after setup (because of
nvidia-driver
). If you already hadnvidia-driver
installed when you ran the script, you will only need to logout and login again (so that your user is added to thedocker
group).
Alternatively, an ansible
playbook is available: ansible/README.md
.
After setup, verify that the target node has been setup correctly by running the nvidia-ctk-test.
AI applications often have dependencies that conflict with one another.
- For python dependency isolation, the usual solution is virtual environments (
venv
s). - For system dependency isolation, the usual solution is containerization (e.g.
Docker
).
This setup uses NVIDIA Container Toolkit
to ensure that our docker containers can access the GPU. This simplified diagram shows how this works:
Package | Purpose |
---|---|
cuda-toolkit |
Needed for developing CUDA applications |
cuda-runtime |
Needed for running CUDA applications |
nvidia-driver |
Needed to allow programs to use the GPU |
nvidia-container-toolkit |
Exposes the nvidia-driver to containers (like docker , containerd , etc) |
In practice, the apt-cache
dependency graph is more complicated than the diagram above.
To inspect the dependency graph, run this command inside a container where CUDA is installed:
apt-cache depends cuda
Users are encouraged to create new docker images as needed.
The OSAI Apps
repo has several examples and contains Dockerfile
s that may be good starting points.