Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Error] --nvidia leaves incomplete vulkan installation #848

Closed
VortexAcherontic opened this issue Jul 9, 2023 · 7 comments
Closed

[Error] --nvidia leaves incomplete vulkan installation #848

VortexAcherontic opened this issue Jul 9, 2023 · 7 comments
Labels
bug Something isn't working

Comments

@VortexAcherontic
Copy link

Describe the bug
Hello there Not sure if I missed it but couldn't find an issue regarding this:
I noticed that a container created with the --nvidia flag seems to not mount the /etc/vulkan/icd.d/nvidia_icd.json file into the container which makes vulkan applications not working in a distrobox container.

Manually coping the nvidia_icd.json from my host system into the container resolved the issue.

However running vulkaninfo with my modifications still threw some errors which makes me believe there might be more vulkan related driver stuff not properly mounted into the container.

To Reproduce

  • create a new container with the --nvidia flag
  • try to run any vulkan application such as vkcube or vulkaninfo

Expected behavior

  • Vulkan applications to work

Logs
vkcube

vortexacherontic@tumbleweed:~> vkcube
vkEnumerateInstanceExtensionProperties failed to find the VK_KHR_surface extension.

Do you have a compatible Vulkan installable client driver (ICD) installed?
Please look at the Getting Started guide for additional information.

vulkaninfo:

vortexacherontic@tumbleweed:/etc/vulkan/icd.d> vulkaninfo 
ERROR at /home/abuild/rpmbuild/BUILD/Vulkan-Tools-sdk-1.3.250.0/vulkaninfo/vulkaninfo.h:1480:vkGetPhysicalDeviceSurfaceSupportKHR failed with ERROR_INITIALIZATION_FAILED

Desktop (please complete the following information):

  • podman: 4.5.1
  • distrobox: 1.5.0 (Maybe it is fixed in a later release? in that case I need to contact my distribution maintainers)
@VortexAcherontic VortexAcherontic added the bug Something isn't working label Jul 9, 2023
@VortexAcherontic
Copy link
Author

By digging a bit deeper I could manged to get all Vulkan related stuff to work properly inside the distrobox.

It seems upon creating the container the following files are not mounted into the container:

/usr/lib64/libnvidia-vulkan-producer.so
/etc/vulkan/icd.d/nvidia_icd.json
/etc/vulkan/implicit_layer.d/nvidia_layers.json

After manually obtaining those files from my host system and copying the into the container both vkcube and vulkaninfo worked properly.

for /etc/vulkan I added a volume to my distrobox.ini file:
volume=/etc/vulkan/:/etc/vulkan/
The *.so file I just copied into the container manually

@VortexAcherontic VortexAcherontic changed the title [Error] --nvidia seems not mount all driver components required for vulkan into the container [Error] --nvidia incomplete vulkan installation Jul 10, 2023
@VortexAcherontic VortexAcherontic changed the title [Error] --nvidia incomplete vulkan installation [Error] --nvidia leaves incomplete vulkan installation Jul 10, 2023
@89luca89
Copy link
Owner

89luca89 commented Aug 4, 2023

This seems strange,
in the integration step the first thing is search the whole /run/host/usr/lib* so /usr/lib64/libnvidia-vulkan-producer.so should be found

About /etc/, I've just added it
From my tests (on mockfiles that is) all files in /usr/lib* are correctly detected so for now I'll close this

If you find that files are still not picked up let's reopen this and someone with an nvidia machine will need to help me :)

@VortexAcherontic
Copy link
Author

VortexAcherontic commented Aug 4, 2023

Does the script also copy symlinks?

Working host libraries

/usr/lib64> ll | grep -i vulkan
lrwxrwxrwx. 1 root root        28  2. Aug 19:15 libgstvulkan-1.0.so.0 -> libgstvulkan-1.0.so.0.2205.0
-rwxr-xr-x. 1 root root    271352  2. Aug 19:15 libgstvulkan-1.0.so.0.2205.0
lrwxrwxrwx. 1 root root        38 19. Jul 15:24 libnvidia-vulkan-producer.so -> libnvidia-vulkan-producer.so.535.86.05
lrwxrwxrwx. 1 root root        38 19. Jul 15:24 libnvidia-vulkan-producer.so.535 -> libnvidia-vulkan-producer.so.535.86.05
-rwxr-xr-x. 1 root root     22856 19. Jul 15:24 libnvidia-vulkan-producer.so.535.86.05
lrwxrwxrwx. 1 root root        20  3. Jun 01:02 libvulkan.so.1 -> libvulkan.so.1.3.250
-rwxr-xr-x. 1 root root    456800  3. Jun 01:02 libvulkan.so.1.3.250

broken guest libraries

/usr/lib64> ll | grep -i vulkan
lrwxrwxrwx. 1 root  root         38 Aug  4 22:58 libnvidia-vulkan-producer.so.535 -> libnvidia-vulkan-producer.so.535.86.05
-rwxr-xr-x. 1 65534 65534     22856 Jul 19 15:24 libnvidia-vulkan-producer.so.535.86.05
lrwxrwxrwx. 1 root  root         20 Jun  3 01:02 libvulkan.so.1 -> libvulkan.so.1.3.250
-rwxr-xr-x. 1 root  root     456800 Jun  3 01:02 libvulkan.so.1.3.250
-rwxr-xr-x. 1 root  root   14760856 Jun 24 21:01 libvulkan_intel.so
-rwxr-xr-x. 1 root  root   13716376 Jun 24 21:01 libvulkan_intel_hasvk.so
-rwxr-xr-x. 1 root  root    9954696 Jun 24 21:01 libvulkan_radeon.so

As you can see, you're right, all existing libraries get copied into the container.
But libnvidia-vulkan-producer.so is a symlink to the actuall driver *.so.major.minor.patch eg: libnvidia-vulkan-producer.so.535.86.05

As you can see in the example above the host has this sysmlink while the (fresh guest) does not.

This is true for openSUSE Tumbleweed, Fedora and Arch Linux images (those are the once I have currently installed)

While the directory containing the icd file is missing entirely in the conteiner systems:

/etc of host:

...
drwxr-xr-x. 1 root root       10 27. Jul 17:58 uefi
drwxr-xr-x. 1 root root       22  6. Jul 18:55 UPower
-rw-r--r--. 1 root root      115 22. Mär 13:08 vconsole.conf
drwxr-xr-x. 1 root root       42 19. Jul 15:24 vulkan <-------------
-rw-r--r--. 1 root root     5029 14. Jun 22:54 wgetrc
drwxr-xr-x. 1 root root       38  6. Jun 16:57 wpa_supplicant
drwxr-xr-x. 1 root root       72 22. Mär 12:48 X11
-rw-r--r--. 1 root root      681 14. Jun 17:12 xattr.conf
drwxr-xr-x. 1 root root      142 22. Mär 12:47 xdg
drwxr-xr-x. 1 root root       90 22. Mär 12:46 xml
drwxr-xr-x. 1 root root       72 22. Mär 12:48 YaST2
drwxr-xr-x. 1 root root      282 29. Jul 20:59 zypp

/etc of guest:

drwxr-x---.  1 root  root       14 Aug  4 22:58 sudoers.d
drwxr-xr-x.  1 root  root      114 Jul 16 21:55 sysconfig
drwxr-xr-x.  1 root  root        0 Feb 14 16:50 sysctl.d
drwxr-xr-x.  1 root  root      156 Jul 14 16:38 systemd
lrwxrwxrwx.  1 root  root       25 Jul 12 18:11 termcap -> ../usr/share/misc/termcap
drwxr-xr-x.  1 root  root        0 Jul 12 18:10 terminfo
drwxr-xr-x.  1 root  root        0 Feb 14 16:50 tmpfiles.d
//vulkan should be here
-rw-r--r--.  1 root  root     5029 Jun 14 22:54 wgetrc
-rw-r--r--.  1 root  root      681 Jun 14 17:12 xattr.conf
drwxr-xr-x.  1 root  root       32 Jul 14 16:38 xdg
drwxr-xr-x.  1 root  root      260 Jul 14 16:38 zypp

While it is true you are mentioning icd files in the distrobox-init script:

# First we find all non-lib files we need, this includes
	#	- binaries
	#	- confs
	#	- icd files
	#	- egl files
	NVIDIA_FILES="$(find /run/host/usr/ \
		-path "/run/host/usr/share/doc*" -prune -o \
		-path "/run/host/usr/src*" -prune -o \
		-path "/run/host/usr/lib*/modules*" -prune -o \
		-path "/run/host/usr/share/man*" -prune -o \
		-path "/run/host/usr/lib*" -prune -o \
		-type f -iname "*nvidia*" -print || :)"
	for nvidia_file in ${NVIDIA_FILES}; do
		dest_file="$(printf "%s" "${nvidia_file}" | sed 's|/run/host||g')"
		mount_bind "${nvidia_file}" "${dest_file}" ro
	done

it does not search /etc/vulkan which is where the nvidia driver puts those.

I am no expert here, I just took a look a little closer and as I apparently have a nvidia system and hope I could gather enough resources for at least a hint.

@89luca89 89luca89 reopened this Aug 4, 2023
@89luca89
Copy link
Owner

89luca89 commented Aug 4, 2023

Interesting thanks for this report!

About /etc/ it has been just pushed, so you need to use the git version to have it working

Seems like symlinks are being skipped, I'll try to mock something to also mount the symlinks 👍

@VortexAcherontic
Copy link
Author

Oh yes I missed the /etc/ part, which is just a few lines above the once I copy pasted 😅 My bad.

And many thanks for looking into this!

@89luca89
Copy link
Owner

89luca89 commented Aug 5, 2023

I've pushed a fix now that correctly preserves the symlinks 😄
image

Thanks for the help!

@VortexAcherontic
Copy link
Author

Many thanks! ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants