-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WL Vulkan apps are broken with PRIME #72
Comments
Thanks for the report. I suspect this has the same root-cause as the issue you reported earlier #69, namely that we're passing a buffer to the AMD GPU which isn't aligned to 256 bytes. The relevant code-path in the driver is used by both EGL and Vulkan applications on Wayland, so the same bug would be present in both cases. |
@erik-kz The issue in this case is that NVIDIA specific format modifiers are being used on PRIME setups (the same issue as #41) so it's different from #69 And that causes the Mesa driver to fail because it doesn't understand those NVIDIA specific ones (so linear modifier is the only reliable option) You've already fixed the OpenGL path with 866a801#diff-8965d13061a6bcaea4358bcc9c757a91fbd9b3cc16fcb3bf1dd579c667fc5528R1269 but Vulkan is somehow different 🤔 Compare these two WAYLAND_DEBUG lines (one is OpenGL on my PRIME setup and the other is Vulkan on the same setup):
Notice the non-zero values for layout modifiers in the Vulkan section? |
Ah I see, thanks for the clarification. Yes, this does indeed appear to be a different issue. We use a kind of back-door interface into egl-wayland for Vulkan which has the effect of bypassing the linear buffer allocation. Like the other bug, though, the fix will need to be on the driver side. |
Still present in 525.60.11 :( |
I have the same issue with an Intel + Nvidia setup. Vulkan apps with native wayland support crash at startup when using the dGPU. Running vkcube-wayland with my dGPU gives me:
Retroarch and Ryujinx also gave me similar results, crashing at startup if I try to run them with prime-run. Running both apps through Xwayland works like a charm, though, as did using the opengl API instead of Vulkan. My system info: Distro: Arch Linux |
Wait, are OpenGL applications on Wayland with PRIME offload supposed to work? The windows for the applications never appear on my end, even if they are seemingly running. I'm on 2060 Mobile with nvidia 525.60.11, egl-wayland 1.1.11 and mesa 22.2.4 right now. Running them without PRIME offload works obviously, but it runs on the integrated Intel GPU. |
OpenGL applications should work with PRIME offload, yes. Are you using an Intel or AMD integrated GPU? Note that the 525 driver has a bug which prevents it working with AMD (see #69). This will be fixed in 530. |
Thanks for the fast reply! Running results in the application running without any window appearing (it does seem to 'exist' to Gnome, though. Appears in the list of applications when Alt+Tab'ing), and by looking at nvidia-smi's output, it indeed uses an nvidia GPU:
Here's the console output with |
Are you setting __NV_PRIME_RENDER_OFFLOAD=1? It shouldn't be necessary to set __EGL_VENDOR_LIBRARY_FILENAMES. |
This appears to be a bug in eglgears_wayland. It calls poll() on the Wayland display fd without first calling wl_display_prepare_read https://gitlab.freedesktop.org/mesa/demos/-/blob/main/src/egl/eglut/eglut_wayland.c#L279 Have you tried any other OpenGL applications? |
I tried PCSX2. It just hangs on OpenGL with UPDATE: just tried PPSSPP and it seems to work. Not sure what was the problem with PCSX2 |
Sorry. Turns out I had GRUB configured incorrectly, so DRM mode setting wasn't enabled. Once I got proper |
Oh, cool. We do plan to make modeset=1 the default in the near future. It's just that right now it can cause problems for some workstation SLI configurations. |
Possibly related issue described in NVIDIA/open-gpu-kernel-modules#317 (comment) I wonder why is it the case that offload env variable changes the behaviour, so instead of a crash the program hangs? |
Our GPUs render using a hardware-specific pixel layout which Intel and AMD GPUs don't understand. When __NV_PRIME_RENDER_OFFLOAD=1 is set, after rendering each frame we will convert it to a linear layout so that the integrated GPU can display it. The code to do that is wired up for OpenGL and Vulkan X11 applications, and OpenGL Wayland applications, but not for Vulkan Wayland applications. For Vulkan applications, __NV_PRIME_RENDER_OFFLOAD=1 will also enable the NV_optimus layer as you mentioned, which changes the order that GPUs are enumerated so that the NVIDIA GPU will appear first. |
Is this related to: vkvia, vulkaninfo from LunarG's SDK are unable to detect the discrete NVIDIA GPU on Ubuntu 22.04/Wayland? Only Intel GPU0, and llvmpipe GPU1 are detected. |
Thank you for this ELI5-level answer. Are you permitted to let us know if this is in the works? Or if it is tagged as WONTFIX internally? @erik-kz |
Still present in 530.30.02 :( |
This is not a WONTFIX, we do intend to get it working. And while I can't provide an ETA right now, it will definitely be before we drop Pascal (10-series) support. That won't happen for quite a while, I mean we haven't even dropped Maxwell (9-series) support yet. |
I have the exact same issue on Void Linux with driver |
This is becoming quite a serious issue for many people. Can this please be made a priority? |
@TheComputerGuy96 Does this work in the latest beta driver? |
This feature has been implemented by @dkorkmazturk. It will be available in the next major driver version, 545 (not the recently released 535 beta). |
After 1.1.12 release it's happening again for some OpenGL apps as well (e.g. mpv: mpv-player/mpv#11774) |
Is there a known timeline for this version? |
@Dirleye thanks for the information! If you could please upload the file generated by our nvidia-bug-report.sh script that would be helpful. We're still trying to figure out why only certain systems seem to be experiencing this bug, so the more data we have, the better. |
@erik-kz of course, no problem. I occasionally checked nvidia-smi to see the clock speeds and power state which were all glued to their lowest throughout. |
Here's mine: nvidia-bug-report.log.gz |
Omg, I finally managed to reproduce the vkcube-wayland hang with a different GPU (Quadro P620). Not exactly sure what the cause it yet, but at least now it's possible to debug. What does seem immediately clear is that it's not a power management issue, it actually looks like it's related to a new synchronization mechanism that was introduced in 545. I shall update with further progress. Thanks so much to everyone who provided logs, etc... that definitely helped narrow down the problem. |
Unfortunately still broken for me. |
A quick update - we have figured out what is causing the issue. It did turn out to be a driver bug affecting pre-Turing GPUs. The fix is targeted for the next driver release, 550, early next year. |
@erik-kz does it mean that post Turing GPUs are fixed now in 545.29.06 and don't need a future 550 driver? |
Vulkan Wayland applications should be working correctly with 545.29.06 on Turing-or-later GPUs. Including PRIME render-offload. The issue I was referring to in my previous comment was the extremely low framerates (0.2FPS) that several users had reported. All of those users had Pascal GPUs. |
Can you share some technical details about what exactly the issue was and
any workaround (other than running a background app) for the time being?
…On Wed, 6 Dec, 2023, 04:54 Erik Kurzinger, ***@***.***> wrote:
does it mean that post Turing GPUs are fixed now in 545.29.06 and don't
need a future 550 driver?
Vulkan Wayland applications should be working correctly with 545.29.06 on
Turing-or-later GPUs. Including PRIME render-offload.
The issue I was referring to in my previous comment was the extremely low
framerates (0.2FPS) that several users had reported. All of those users had
Pascal GPUs.
—
Reply to this email directly, view it on GitHub
<#72 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABOOCZ4KL52A56UFD6YSGZ3YH6UJFAVCNFSM6AAAAAARWIM3T6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBRG44DOOBSGI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
The 545 driver was the first version to include support for sync_files, https://www.kernel.org/doc/Documentation/sync_file.txt, a new synchronization mechanism. The bug was in our implementation of that feature. 545 also included a fairly extensive re-write of the Vulkan Wayland WSI code, and part of that made use of the new sync_file functionality. That's why Vulkan Wayland apps were affected by the bug. A possible work-around would be to extract the driver installer and edit the file nvidia-drm-drv.c. In the
This will disable sync_file support |
Actually can confirm that workaround works, but why delete whole block? It seems that deleting code inside the macro is enough. Here is a patch for NixOS users: hardware.nvidia.package = config.boot.kernelPackages.nvidiaPackages.stable.overrideAttrs (old: {
postPatch = ''
substituteInPlace ./kernel/nvidia-drm/nvidia-drm-drv.c --replace \
'#if defined(NV_SYNC_FILE_GET_FENCE_PRESENT)' \
'#if 0'
'';
}); |
Yeah, that's true. Also, I must ask that anyone who uses this work-around please promise to revert it once 550 is released. In the future more things will depend on sync_file support and so having it disabled will almost certainly cause problems. |
It should be removed for driver v550 and later! For more details, see NVIDIA/egl-wayland#72 (comment) Signed-off-by: Pavel Artsishevsky <polter.rnd@gmail.com>
It should be removed for driver v550 and later! For more details, see NVIDIA/egl-wayland#72 (comment) Signed-off-by: Pavel Artsishevsky <polter.rnd@gmail.com>
Tested that on nvidia beta drivers 550.40.07 vkcube-wayland now works correctly without any patches, I think this issue can now be closed. |
Thanks for confirming. Closing the issue. |
I don't know if the issue should be re-opened but... starting with the 550 beta driver and even the 550 release driver WL Vulkan apps start crashing again(vkcube-wayland segfaults and emulators crash too), reverted to 545 and it works fine, my laptop has a Mux Switch so I could theatrically disable the intel gpu and run only the nvidia one but if I do that I can't switch my screen to 240Hz because it makes a black screen with rectangular lines that render a little part of the desktop and it glitches out need to revert to 60Hz, all of these two issues are only with the 550 series(beta and release) 545 is fine. Specs : Intel Core i7-13620H 10C/16T 4.9GHz. EDIT : After installing egl-wayland and enabling nvidia-drm.modeset=1 it works now... I thought nvidia-drm.modeset=1 was enabled by default now... |
Not yet. |
Hello,
This is sort of a continuation of #41 but for Vulkan apps/games
So Vulkan apps (like PPSSPP or vkcube) fail to work with Wayland on my PRIME setup:
As you can see it's identical to the OpenGL error (but the OpenGL one has already been fixed) but I also checked the Wayland logs and the (probably) NVIDIA modifier is present (so the linear modifier needs to be used somehow)
Running both PPSSPP and vkcube with XWayland removes the problem (by using
SDL_VIDEODRIVER=x11
variable or the X11 vkcube executable)And now time for the all important system info 🐸 (although it's kinda redundant here):
Distro: Arch Linux
egl-wayland version: 1.1.11 (Git version also fails)
Mesa version: 22.2.1
Driver version: 515.76
Kernel version: 6.0.6
Compositor: mutter 43.0 (through an unofficial repo)
CPU: Ryzen 5 4600H
GPU: Renoir iGPU + GTX 1650 Ti Mobile (as I said a PRIME setup)
The text was updated successfully, but these errors were encountered: