Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assistance Needed: Getting NVIDIA GPU and Docker Runtime Working on Flatcar #1524

Closed
Keithsc opened this issue Aug 29, 2024 · 2 comments
Closed
Labels
kind/feature A feature request

Comments

@Keithsc
Copy link

Keithsc commented Aug 29, 2024

Hello Flatcar Team,

I've been using Flatcar Linux for a while now and have recently acquired an NVIDIA Tesla M40 GPU. Having successfully utilized Intel GPUs with Docker on Flatcar, I'm now venturing into the world of NVIDIA for the first time.

I'm running a standalone Flatcar Linux instance without any orchestration. While I managed to install the NVIDIA driver as per the documentation, I am struggling to understand how to set up the NVIDIA Container Toolkit and configure Docker to utilize the GPU for applications.

As a newcomer to NVIDIA GPUs, I'd really appreciate guidance on using the GPU in Docker, particularly for setting up and running an application like open-webui . I believe a comprehensive tutorial or example guide would be beneficial, not just for me, but for others who are also new to Flatcar and NVIDIA.
Current Setup:

Flatcar Version : stable = 3975.2.0
Architecture : amd64
Docker Version : 24.0.9, build 293681613
GPU Model : NVIDIA Tesla M40
NVIDIA Driver Version : 535.104.05 (CUDA Version: 12.2)

What I Have Tried:
Followed existing documentation to install the NVIDIA driver.
Attempted to set up the NVIDIA Container Toolkit but faced challenges with runtime configurations.
Tried running a basic NVIDIA CUDA container.

Example Command and Error Encountered:

When I ran the following command:
docker run --gpus all nvidia/cuda:12.6.0-base-ubuntu24.04 nvidia-smi

I received the following error:
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ERRO[0001] error waiting for container: context canceled

Issues Encountered:
Error messages related to GPU runtime selection when attempting to deploy containers.
Confusion over configuration settings for Docker on Flatcar relating to the NVIDIA runtime.

Request for Help:
Clear guidance or documentation on how to configure Docker to work with NVIDIA GPUs on Flatcar.
A step-by-step example of setting up a containerized application (like open-webui) that can utilize the NVIDIA GPU.
Any additional resources or links that could assist newcomers in setting up NVIDIA GPUs on Flatcar would be greatly appreciated.

Thank you for your help and support!
Keith.

@jepio
Copy link
Member

jepio commented Sep 3, 2024

Save this file https://github.com/flatcar/sysext-bakery/releases/download/latest/nvidia_runtime-v1.16.1-x86-64.raw
as /etc/extensions/nvidia_runtime.raw on your Flatcar node, then restart the node. This will install and enable the nvidia-container-runtime for docker and containerd.

You can then install the GPU operator:

helm install --wait --generate-name \
    -n gpu-operator --create-namespace \
    nvidia/gpu-operator \
	--set driver.enabled=false \
	--set toolkit.enabled=false \

It should pass validation.

This will be added to Flatcar docs very soon.

@Keithsc
Copy link
Author

Keithsc commented Sep 4, 2024

I've followed the instructions you provided, and it looks like everything is working as expected now. The sysext installation was straightforward, and after restarting my node, I'm able to run containers with GPU access using "docker run --gpus all nvidia/cuda:12.6.0-base-ubuntu24.04 nvidia-smi". Thanks for all your help with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature A feature request
Projects
None yet
Development

No branches or pull requests

2 participants