Explore Venus + MoltenVK for GPU acceleration #4551

osy · 2022-10-23T06:51:19Z

Currently we use VirGL + ANGLE to translate GL (guest) to Metal (host). This works decently (on Linux) but the downside is that it’s buggy (crashes) and more modern Linux applications and games are moving to Vulkan.

Venus translate guest Vulkan calls to host Vulkan calls.

MoltenVK translates host Vulkan calls to Metal calls.

It is worth exploring this pairing to see if it’s a) more stable and b) more performant.

Note that neither solution currently has Windows guest support so that will have to be developed separately.

tifasoftware · 2022-10-24T19:07:09Z

Could DXVK be used to also translate DirectX to Vulkan?

osy · 2022-10-24T19:46:11Z

Yes but that requires significant more work on windows side

IComplainInComments · 2022-10-25T21:58:49Z

Could DXVK be used to also translate DirectX to Vulkan?

Its more beneficial to just use DXVK on a Linux VM using Steam's Proton, as it would have everything needed already.

tifasoftware · 2022-10-26T01:38:20Z

Could DXVK be used to also translate DirectX to Vulkan?

Its more beneficial to just use DXVK on a Linux VM using Steam's Proton, as it would have everything needed already.

Yeah, thats is one way to go with it. However, I think there should be something that could benefit programs that only work in Windows (and not in WINE/Proton), as well as emulating areo in Vista/7.

osy · 2023-01-06T18:24:02Z

Attempted this in https://github.com/utmapp/UTM/tree/feature/venus-support and hit a blocker. Managed to build everything but there's missing support on macOS/HVF side.

From the Venus docs in Mesa:

The Venus renderer makes assumptions about VkDeviceMemory that has
VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT. The assumptions are illegal and rely
on the current behaviors of the host drivers. It should be possible to remove
some of the assumptions and incrementally improve compatibilities with more
host drivers by imposing platform-specific requirements. But the long-term
plan is to create a new Vulkan extension for the host drivers to address this
specific use case.

The Venus renderer assumes a device memory that has
VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT can be exported as a mmapable dma-buf
(in the future, the plan is to export the device memory as an opaque fd). It
chains VkExportMemoryAllocateInfo to VkMemoryAllocateInfo without
checking if the host driver can export the device memory.

The dma-buf is mapped (in the future, the plan is to import the opaque fd and
call vkMapMemory) but the mapping is not accessed. Instead, the mapping
is passed to KVM_SET_USER_MEMORY_REGION. The hypervisor, host KVM, and
the guest kernel work together to set up a write-back or write-combined guest
mapping (see virtio_gpu_vram_mmap of the virtio-gpu kernel driver). CPU
accesses to the device memory are via the guest mapping, and are assumed to be
coherent when the device memory also has
VK_MEMORY_PROPERTY_HOST_COHERENT_BIT.

While the Venus renderer can force a VkDeviceMemory external, it does not
force a VkImage or a VkBuffer external. As a result, it can bind an
external device memory to a non-external resource.

What this means is that it requires a feature in the Linux kernel (UDMA buffers) which allows QEMU to DMA map memory in a way that GBM/minigbm can access. This way Vulkan can render directly to host device memory.

There's missing support for this all across the board from macOS to MoltenVK. So significant effort would have to be put in to either 1) change the render target to a Metal surface and do some weird guest->host passthrough or 2) port minigbm to use the Metal APIs. There may be other ways but I'm not experienced in the Linux graphics stack.

I think the more promising approach is to use Google android emulator's gfxstream technology which allows Vulkan commands to be serialized and streamed directly from guest to host. Since it already has M1 support, it could be easier. However the challenge is to get it working 1) on QEMU and 2) on vanilla Linux (there are a lot of Android ifdefs in the code).

DUOLabs333 · 2023-04-05T15:28:02Z

@osy I tried building from your fork of virglrenderer, but I couldn't get Venus to compile: gbm.h is missing, or is this what you meant by a "lack of support"?

zaptrem · 2023-07-03T02:21:53Z

@osy Could Apple's new D3DMetal make graphics acceleration support any easier?

tifasoftware · 2023-07-03T02:41:39Z

As long as apple license permits it

osy · 2023-07-03T03:44:40Z

@zaptrem it doesn't change anything for our purposes however in theory it may open up a path of ParavirtualizedGraphics (used in macOS guests for GPU virtualization through Metal) to Linux/Windows via D3DMetal. However, my hunch is that it would be much much harder to do that than Venus + MoltenVK or gfxstream + MoltenVK (the current plan of action).

DUOLabs333 · 2023-07-03T11:34:42Z

Hey @osy, I've been following the work on gfxstream (I've been trying independently to add Vulkan by patching virglrenderer). For vkcube to work, mesa's VENUS driver needs some extensions that MoltenVK can't implement. How are you planning to get around that (I'm seeing some references to opengl-goldfish. Is that the replacement for mesa?)

osy · 2023-07-03T11:48:19Z

@DUOLabs333 no that’s why I said “ gfxstream + MoltenVK (the current plan of action)”

DUOLabs333 · 2023-08-03T00:23:22Z

I've been working on this for a while now, and I got far enough that I can see the Draw commands being executed in the log (nothing on screen though, only a black window). However, when I updated mesa from 23.0 to 23.1, everything broke and I had to start all over. I was able to fix some of the problems, but I got an assertion crash: line with assert(isv) in target/arm/hvf/hvf.c, that occurs after the guest requests a blob to be mapped. I determined the mapping operation itself is not the problem, but something on the guest. Do you know any situations where such a crash would occur?

osy · 2023-08-03T00:41:46Z

Check out https://gitlab.com/qemu-project/qemu/-/issues/1611

DUOLabs333 · 2023-08-03T00:54:31Z

Ah, I see (I've been following the issue, but I've been skimming it). This is obviously much outside my area of expertise, but from my understanding, what seems to be happening is this: when virgl_renderer_map_blob is called, and the shmem is mapped, the physical address corresponding to the blob on the host isn't exposed as mapped to the guest. So, when an instruction tries to operate on the address, it returns some error. QEMU catches the error, and figures out how to apply the instruction on the corresponding host memory address.

The problem is that QEMU isn't doing that last part, and is just erroring. Did I get it right?

In any case, I wonder what changed between the two versions to trigger this.

osy · 2023-08-03T01:00:07Z

The problem is that when memory is mapped as MMIO, it will always trap and fail to decode what to do (ISV=0) if it’s a uncommon instruction like an atomic store or LDP or a cache line copy or something. Therefore it needs to be mapped as direct memory which should not trap at all.

DUOLabs333 · 2023-08-04T01:36:08Z

How would I do this on MacOS? I looked at it briefly when first starting (I got very confused, and just used shmem instead). There doesn't seem to be anything analogous with Linux's dma-buf, and I can't find a way to get a memory address I can use memcpy and friends on directly (I'm guessing it has something to do with IODMACommand, but I have no idea what to do with that).

osy · 2023-08-04T01:41:53Z

It would be a lot of work. I'm afraid I am no help there. I also took a look and gave up due to the amount of work that would be required.

DUOLabs333 · 2023-08-04T03:42:41Z

Ok, here's what I got:

Create IODMACommand instance.
init instance
Call getMemoryDescriptor on the class instance
Call getPhysicalAddress on the descriptor

I'm not sure how to convert this address into a file though, so virglrenderer can mmap seamlessly.

DUOLabs333 · 2023-08-04T04:01:10Z

I think I got it:
I can use funopen to make a psuedo file, which can implement the descriptor's operations transparently when being mmaped.

DUOLabs333 · 2023-08-04T04:59:24Z

This is weird though --- the code path which leads to the error (which notably, I never reached before, which explains why I never gotten this error before), specifically wants shmem. If this was a problem with QEMU, why hasn't this been caught before?

DUOLabs333 · 2023-08-04T23:14:15Z

The problem exists even if you make a temporary tmpfile, instead of a shmem.

DUOLabs333 · 2023-08-07T19:55:50Z

Ok, I made a first version of using DMA instead of shmems, but I'm stuck at including the path from Kernel.framework.

If I include <Kernel/IOKit/IODMACommand.h>, then compilation fails, because that file includes <IOKit/IOCommand.h>. The problem is that MacOS looks for IOCommand in IOKit, where it doesn't exist. However, it does exist under Kernel/IOKit/IOCommand.h

DUOLabs333 · 2023-08-08T20:47:11Z

Apparently, I had to clean out my virglrenderer build folder before the -I option took hold. However, I immediately ran into another blocker. IODMACommand is only for C++, but qemu is written in C. We would have to include to compile a wrapper.

DUOLabs333 · 2023-08-08T22:36:58Z

I've written the wrapper, but I've gotten some errors around APPLE_KEXT_OVERRIDE. This might mean that we would have to make a kext for UTM/qemu, which might not be desirable.

DUOLabs333 · 2023-08-09T19:36:27Z

Ok, I rewrote it to use DriverKit, but now IOBufferMemoryDescriptor::Create fails with kIOReturnNotReady, which is a weird message to get (I thought I would have gotten something about permissions). I added the com.apple.developer.driverkit entitlement, just to be safe.

DUOLabs333 · 2023-08-11T03:45:43Z

Ok, since DriverKit seems to REQUIRE a driver to use any of its functions (or at least some special setup), I rewrote the DMA code once again to use IOSurface.

However, now I realize that fileno can't create file descriptors for file pointers. So, is there a way to get file descriptor to either void pointers or IOSurfaces?

DUOLabs333 · 2023-08-12T00:25:45Z

I got mesa working with IOSurface, but I still get the assertion error. Is there something else I'm not doing?

upintheairsheep · 2023-08-22T23:03:43Z

A full implementation of DirectX 12 to 9 is present via Apple Game Porting Toolkit

https://www.reddit.com/r/macgaming/comments/142tomx/apples_game_porting_toolkit_seems_to_have_a/

tifasoftware · 2023-08-22T23:22:14Z

We should take a look at this if the license allowsOn Aug 22, 2023, at 6:03 PM, upintheairsheep ***@***.***> wrote: A full implementation of DirectX 12 to 9 is present via Apple Game Porting Toolkit https://www.reddit.com/r/macgaming/comments/142tomx/apples_game_porting_toolkit_seems_to_have_a/ —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>

baryluk · 2023-11-17T19:08:19Z

I am new to mac (was using Linux for over 20 years), but got Debian Linux running on UTM, and works nicely.

I am also interested in Venus.

Another benefit of venus over virgl, would be better handling of multiple OpenGL apps (when running Zink on top of venus to provide GL) in the guest. With virgl they are all funneled to a single host side OpenGL context. This has issues with buffer flip / sync, and due to OpenGL highly synchronous nature (at least in virgl world), causes stutters when one has two OpenGL apps open (i.e. glxgears + benchmark) - I have seen this with Linux guest on Linux host. With venus, each open of a device instance on a host, is mapped to open on the host, and all contexts are separate as they would be natively, and no more stutter.

Also, Zink implements OpenGL 4.6 on suitably modern Vulkan driver (I do not know if Zink works on MoltenVK as of the moment, but there was a bit of work on this in the past - but get blocked mostly due to Mesa requiring some features that are simply not in the macos. If Zink runs on the guest, and MoltenVK on the host, then this should not be a problem tho).

Of course dxvk and others should work too (with suitable work on the guest side, for things like page size differences).

DUOLabs333 · 2023-12-21T19:29:36Z

QEMU 8.2.0 was just released, with the Android Emulator's rutabaga merged. rutabaga supports Vulkan; however, from what I can tell, MacOS support hasn't been fully finalized (it's likely that it will come eventually).

I am working on another approach that doesn't require changes to QEMU --- the tradeoff is that it is slower (how much slower is to be seen).

DragonSWDev · 2024-09-20T13:38:24Z

gfxstream was merged into Mesa for Vulkan virtualization. Does this improve the UTM situation in any way?

DUOLabs333 · 2024-09-20T13:40:38Z

@DragonSWDev gfxstream has already been merged into QEMU for a while. However, gfxstream does not currently support macOS.

DragonSWDev · 2024-09-20T15:27:48Z

@DUOLabs333 Isn't that used in Android Emulator that supports macOS?

DUOLabs333 · 2024-09-20T15:40:58Z

@DragonSWDev It is, but I'm not sure whether that is a separate build or is a part of their open source release. I've asked a question of this effect on the MR, hopefully I get a response. If it is officially supported, I guess I can drop my project and see if I can build gfxstream (however, it'll still take a while for distributions to build the new release, which has the new Mesa driver required to use gfxstream on the guest).

DUOLabs333 · 2024-09-20T17:45:50Z

Looking through the source code, gfxstream definitely has macOS support, but I'm not sure the gfxstream integration in QEMU does.

DUOLabs333 · 2024-09-22T23:16:00Z

I've successfully been able to compile QEMU with rutabaga_gfx enabled. However, actually starting QEMU fails --- I'll have to look into this.

oliverbestmann · 2024-10-22T13:59:57Z

Did you get anywhere? @DUOLabs333

DUOLabs333 · 2024-10-22T14:37:27Z

@oliverbestmann No, I haven't had the time. However, I will note that building all of the necessary components from source takes quite a lot of space (~3-4 GB), so I'm limited to Nix releases, which likely does not have improvements/bugfixes that may have landed the master branch since the last release. Also, while developing my own driver, I've noticed issues with MoltenVK that I've had to work around, but it's possible that gfxstream does not do the same. Therefore, it's possible that there's no clear benefit to be gained by relying on gfxstream.

DUOLabs333 · 2024-12-11T23:30:45Z

It seems that QEMU recently merged in venus support (so no more building custom kernels or QEMU forks). It's available in 9.2.0.

dboyan · 2025-01-15T20:47:15Z

Just wanna share some info for reference if someone wants to work or is working on this. It would be exciting to see vulkan acceleration to work on macos host, but I can see that there are a few challenges or caveats on the way.

I guess the greatest challenge is that Venus requires the ability to share graphics memory between qemu and the renderer process. On linux, it is done via dma_buf file descriptors. Apparently we don't have such thing on macos. It is not impossible to do, but probably needs some considerable work on virglrenderer (and possibly qemu). One possibility is to share host memory across processes in the macos way and play with VK_EXT_external_memory_host to properly wire things up. If this is addressed, the rest is probably easier.

Meanwhile, the robustness of MoltenVK at the current stage is another question. I tried to hook up virgl+angle with the latter's vulkan backend and I discovered right away that the desktop cannot render correctly due to a bug in mvk. I managed to fix that bug and now the desktop renders and basic apps work. But certainly, there should be quite a few other issues around. Although I guess is probably okay to provide best-effort vulkan support at the first stage (if someone takes this up), and still rely on proper gl implementation (like angle) to render the vm desktop. If moltenvk (or any vulkan impl on macos) grows to production quality, using zink for gl in vm will probably be in sight.

osy · 2025-01-15T23:30:09Z

@dboyan do you have a repository with your progress?

dboyan · 2025-01-16T00:10:12Z

@dboyan do you have a repository with your progress?

Not with venus yet. Currently I'm toying around with virgl+angle+moltenvk trying to obtain a bit more understanding about where things stand. That itself only requires a few lines of change in qemu. I currently don't have plan to work on venus due to my limited spare time.

osy added the enhancement New feature or request label Oct 23, 2022

osy added this to the Future milestone Oct 23, 2022

tifasoftware mentioned this issue Oct 24, 2022

Is there a way to get Windows Aero in Windows 7? #4547

Open

DUOLabs333 mentioned this issue Aug 14, 2023

[Proposal] Writing a Streaming Vulkan Driver #5561

Open

osy mentioned this issue Jan 3, 2025

QEMU 9.2 with Venus Support #6946

Closed

Explore Venus + MoltenVK for GPU acceleration #4551

Explore Venus + MoltenVK for GPU acceleration #4551

Comments

osy commented Oct 23, 2022

tifasoftware commented Oct 24, 2022

osy commented Oct 24, 2022

IComplainInComments commented Oct 25, 2022

tifasoftware commented Oct 26, 2022

osy commented Jan 6, 2023

DUOLabs333 commented Apr 5, 2023 • edited Loading

zaptrem commented Jul 3, 2023

tifasoftware commented Jul 3, 2023

osy commented Jul 3, 2023

DUOLabs333 commented Jul 3, 2023 • edited Loading

osy commented Jul 3, 2023

DUOLabs333 commented Aug 3, 2023

osy commented Aug 3, 2023

DUOLabs333 commented Aug 3, 2023

osy commented Aug 3, 2023

DUOLabs333 commented Aug 4, 2023

osy commented Aug 4, 2023

DUOLabs333 commented Aug 4, 2023

DUOLabs333 commented Aug 4, 2023

DUOLabs333 commented Aug 4, 2023

DUOLabs333 commented Aug 4, 2023

DUOLabs333 commented Aug 7, 2023

DUOLabs333 commented Aug 8, 2023

DUOLabs333 commented Aug 8, 2023

DUOLabs333 commented Aug 9, 2023

DUOLabs333 commented Aug 11, 2023

DUOLabs333 commented Aug 12, 2023 • edited Loading

upintheairsheep commented Aug 22, 2023

tifasoftware commented Aug 22, 2023 via email

baryluk commented Nov 17, 2023

DUOLabs333 commented Dec 21, 2023

DragonSWDev commented Sep 20, 2024

DUOLabs333 commented Sep 20, 2024

DragonSWDev commented Sep 20, 2024

DUOLabs333 commented Sep 20, 2024 • edited Loading

DUOLabs333 commented Sep 20, 2024

DUOLabs333 commented Sep 22, 2024

oliverbestmann commented Oct 22, 2024

DUOLabs333 commented Oct 22, 2024

DUOLabs333 commented Dec 11, 2024 • edited Loading

dboyan commented Jan 15, 2025

osy commented Jan 15, 2025

dboyan commented Jan 16, 2025

DUOLabs333 commented Apr 5, 2023 •

edited

Loading

DUOLabs333 commented Jul 3, 2023 •

edited

Loading

DUOLabs333 commented Aug 12, 2023 •

edited

Loading

DUOLabs333 commented Sep 20, 2024 •

edited

Loading

DUOLabs333 commented Dec 11, 2024 •

edited

Loading