Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nvidia runtime default #153

Merged
merged 6 commits into from
Oct 2, 2023
Merged

Conversation

jocado
Copy link
Contributor

@jocado jocado commented Sep 6, 2023

Fix bug in runtime config

Should be nvidia-container-runtime binary, not nvidia-ctk

NVIDIA GPU support works without using the full nvidia-container-runtime,
but in some cases it turns out that switching to the
nvidia-container-runtime entirely is beneficial [ ability to schedule multiple simultaneous GPU containers ]

Example usage:

docker run --rm --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all {image-name} {container-name}

Should be nvidia-container-runtime binary, not nvidia-ctk

NVIDIA GPU support works without using the full nvidia-container-runtime,
but in some cases it turns out that switching to the
nvidia-container-runtime entirely is benaficial.
@jocado
Copy link
Contributor Author

jocado commented Sep 6, 2023

Hi @lucaskanashiro

Would it be possible to merge this change [ which essentially fixes a bug ] so it could be bumped to stable before the changes in #152 ?

Thanks!

@jocado
Copy link
Contributor Author

jocado commented Sep 22, 2023

Hi @lucaskanashiro

Would it be possible to merge this change [ which essentially fixes a bug ] so it could be bumped to stable before the changes in #152 ?

Thanks!

Hi @lucaskanashiro - just wondering if you saw this request and had any feedback ?

Thank you.

snap/snapcraft.yaml Outdated Show resolved Hide resolved
snap/hooks/post-refresh Show resolved Hide resolved
@lucaskanashiro
Copy link
Contributor

OK. The changes now look good to me. Let's wait for the CI results to approve this.

@jocado
Copy link
Contributor Author

jocado commented Sep 29, 2023

Great - thanks for review 👍

Do you have any rough idea about timescales for promotion through channels ?

@lucaskanashiro
Copy link
Contributor

It will depend on the internal testing which might take some weeks (based on the previous revision). BTW, The last PR I merged from you is already in the candidate channel, feel free to test it out.

@jocado
Copy link
Contributor Author

jocado commented Sep 29, 2023

Sure - I already tested it - thank you 😄

I will keep an eye on the revisions over the next few weeks.

@lucaskanashiro lucaskanashiro merged commit f2901f8 into canonical:main Oct 2, 2023
1 check passed
@lucaskanashiro
Copy link
Contributor

I just merged it but now I noticed we should have squashed some commits to keep the history clean. Let's try to do it next time.

@jocado
Copy link
Contributor Author

jocado commented Oct 2, 2023

I will try and remember that for next time - thanks for your review and help 👍

@YamiYukiSenpai
Copy link

YamiYukiSenpai commented Oct 29, 2023

$ sudo docker run -d --name jellyfin --net=host --volume /home/.jellyfin/docker/config:/config --volume /home/.jellyfin/docker/cache:/cache --mount type=bind,source=/media,destination=/media,ro=false --user 1001:1001 --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-modeset:/dev/nvidia-modeset --device /dev/nvidia-uvm:/dev/nvidia-uvm --device /dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools --runtime=nvidia --gpus all jellyfin/jellyfin
docker: Error response from daemon: unknown or invalid runtime name: nvidia.

Still no dice for me

channels:
  latest/stable:    20.10.24 2023-05-25 (2893) 135MB -
  latest/candidate: 20.10.24 2023-09-29 (2904) 135MB -
  latest/beta:      20.10.24 2023-10-02 (2910) 135MB -
  latest/edge:      24.0.5   2023-10-07 (2915) 136MB -
  core18/stable:    20.10.17 2023-03-13 (2746) 146MB -
  core18/candidate: ↑                                
  core18/beta:      ↑                                
  core18/edge:      ↑                                
installed:          24.0.5              (2915) 136MB -

also tried removing --runtime=nvidia, and got this:

$ sudo docker run -d --name jellyfin --net=host --volume /home/.jellyfin/docker/config:/config --volume /home/.jellyfin/docker/cache:/cache --mount type=bind,source=/media,destination=/media,ro=false --user 1001:1001 --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-modeset:/dev/nvidia-modeset --device /dev/nvidia-uvm:/dev/nvidia-uvm --device /dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools --gpus all jellyfin/jellyfin
a1931bf82e62bc391ca595b42227c314aeba9d04e713a7b87f0558d70733e208
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.

@jocado
Copy link
Contributor Author

jocado commented Oct 30, 2023

Hi @YamiYukiSenpai

Couple of things:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants