Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore Venus + MoltenVK for GPU acceleration #4551

Open
osy opened this issue Oct 23, 2022 · 43 comments
Open

Explore Venus + MoltenVK for GPU acceleration #4551

osy opened this issue Oct 23, 2022 · 43 comments
Labels
enhancement New feature or request
Milestone

Comments

@osy
Copy link
Contributor

osy commented Oct 23, 2022

Currently we use VirGL + ANGLE to translate GL (guest) to Metal (host). This works decently (on Linux) but the downside is that it’s buggy (crashes) and more modern Linux applications and games are moving to Vulkan.

Venus translate guest Vulkan calls to host Vulkan calls.

MoltenVK translates host Vulkan calls to Metal calls.

It is worth exploring this pairing to see if it’s a) more stable and b) more performant.

Note that neither solution currently has Windows guest support so that will have to be developed separately.

@osy osy added the enhancement New feature or request label Oct 23, 2022
@osy osy added this to the Future milestone Oct 23, 2022
@tifasoftware
Copy link

Could DXVK be used to also translate DirectX to Vulkan?

@osy
Copy link
Contributor Author

osy commented Oct 24, 2022

Yes but that requires significant more work on windows side

@IComplainInComments
Copy link

Could DXVK be used to also translate DirectX to Vulkan?

Its more beneficial to just use DXVK on a Linux VM using Steam's Proton, as it would have everything needed already.

@tifasoftware
Copy link

Could DXVK be used to also translate DirectX to Vulkan?

Its more beneficial to just use DXVK on a Linux VM using Steam's Proton, as it would have everything needed already.

Yeah, thats is one way to go with it. However, I think there should be something that could benefit programs that only work in Windows (and not in WINE/Proton), as well as emulating areo in Vista/7.

@osy
Copy link
Contributor Author

osy commented Jan 6, 2023

Attempted this in https://github.com/utmapp/UTM/tree/feature/venus-support and hit a blocker. Managed to build everything but there's missing support on macOS/HVF side.

From the Venus docs in Mesa:

The Venus renderer makes assumptions about VkDeviceMemory that has
VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT. The assumptions are illegal and rely
on the current behaviors of the host drivers. It should be possible to remove
some of the assumptions and incrementally improve compatibilities with more
host drivers by imposing platform-specific requirements. But the long-term
plan is to create a new Vulkan extension for the host drivers to address this
specific use case.

The Venus renderer assumes a device memory that has
VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT can be exported as a mmapable dma-buf
(in the future, the plan is to export the device memory as an opaque fd). It
chains VkExportMemoryAllocateInfo to VkMemoryAllocateInfo without
checking if the host driver can export the device memory.

The dma-buf is mapped (in the future, the plan is to import the opaque fd and
call vkMapMemory) but the mapping is not accessed. Instead, the mapping
is passed to KVM_SET_USER_MEMORY_REGION. The hypervisor, host KVM, and
the guest kernel work together to set up a write-back or write-combined guest
mapping (see virtio_gpu_vram_mmap of the virtio-gpu kernel driver). CPU
accesses to the device memory are via the guest mapping, and are assumed to be
coherent when the device memory also has
VK_MEMORY_PROPERTY_HOST_COHERENT_BIT.

While the Venus renderer can force a VkDeviceMemory external, it does not
force a VkImage or a VkBuffer external. As a result, it can bind an
external device memory to a non-external resource.

What this means is that it requires a feature in the Linux kernel (UDMA buffers) which allows QEMU to DMA map memory in a way that GBM/minigbm can access. This way Vulkan can render directly to host device memory.

There's missing support for this all across the board from macOS to MoltenVK. So significant effort would have to be put in to either 1) change the render target to a Metal surface and do some weird guest->host passthrough or 2) port minigbm to use the Metal APIs. There may be other ways but I'm not experienced in the Linux graphics stack.

I think the more promising approach is to use Google android emulator's gfxstream technology which allows Vulkan commands to be serialized and streamed directly from guest to host. Since it already has M1 support, it could be easier. However the challenge is to get it working 1) on QEMU and 2) on vanilla Linux (there are a lot of Android ifdefs in the code).

@DUOLabs333
Copy link

DUOLabs333 commented Apr 5, 2023

@osy I tried building from your fork of virglrenderer, but I couldn't get Venus to compile: gbm.h is missing, or is this what you meant by a "lack of support"?

@zaptrem
Copy link

zaptrem commented Jul 3, 2023

@osy Could Apple's new D3DMetal make graphics acceleration support any easier?

@tifasoftware
Copy link

As long as apple license permits it

@osy
Copy link
Contributor Author

osy commented Jul 3, 2023

@zaptrem it doesn't change anything for our purposes however in theory it may open up a path of ParavirtualizedGraphics (used in macOS guests for GPU virtualization through Metal) to Linux/Windows via D3DMetal. However, my hunch is that it would be much much harder to do that than Venus + MoltenVK or gfxstream + MoltenVK (the current plan of action).

@DUOLabs333
Copy link

DUOLabs333 commented Jul 3, 2023

Hey @osy, I've been following the work on gfxstream (I've been trying independently to add Vulkan by patching virglrenderer). For vkcube to work, mesa's VENUS driver needs some extensions that MoltenVK can't implement. How are you planning to get around that (I'm seeing some references to opengl-goldfish. Is that the replacement for mesa?)

@osy
Copy link
Contributor Author

osy commented Jul 3, 2023

@DUOLabs333 no that’s why I said “ gfxstream + MoltenVK (the current plan of action)”

@DUOLabs333
Copy link

I've been working on this for a while now, and I got far enough that I can see the Draw commands being executed in the log (nothing on screen though, only a black window). However, when I updated mesa from 23.0 to 23.1, everything broke and I had to start all over. I was able to fix some of the problems, but I got an assertion crash: line with assert(isv) in target/arm/hvf/hvf.c, that occurs after the guest requests a blob to be mapped. I determined the mapping operation itself is not the problem, but something on the guest. Do you know any situations where such a crash would occur?

@osy
Copy link
Contributor Author

osy commented Aug 3, 2023

@DUOLabs333
Copy link

Ah, I see (I've been following the issue, but I've been skimming it). This is obviously much outside my area of expertise, but from my understanding, what seems to be happening is this: when virgl_renderer_map_blob is called, and the shmem is mapped, the physical address corresponding to the blob on the host isn't exposed as mapped to the guest. So, when an instruction tries to operate on the address, it returns some error. QEMU catches the error, and figures out how to apply the instruction on the corresponding host memory address.

The problem is that QEMU isn't doing that last part, and is just erroring. Did I get it right?

In any case, I wonder what changed between the two versions to trigger this.

@osy
Copy link
Contributor Author

osy commented Aug 3, 2023

The problem is that when memory is mapped as MMIO, it will always trap and fail to decode what to do (ISV=0) if it’s a uncommon instruction like an atomic store or LDP or a cache line copy or something. Therefore it needs to be mapped as direct memory which should not trap at all.

@DUOLabs333
Copy link

How would I do this on MacOS? I looked at it briefly when first starting (I got very confused, and just used shmem instead). There doesn't seem to be anything analogous with Linux's dma-buf, and I can't find a way to get a memory address I can use memcpy and friends on directly (I'm guessing it has something to do with IODMACommand, but I have no idea what to do with that).

@osy
Copy link
Contributor Author

osy commented Aug 4, 2023

It would be a lot of work. I'm afraid I am no help there. I also took a look and gave up due to the amount of work that would be required.

@DUOLabs333
Copy link

Ok, here's what I got:

  1. Create IODMACommand instance.
  2. init instance
  3. Call getMemoryDescriptor on the class instance
  4. Call getPhysicalAddress on the descriptor

I'm not sure how to convert this address into a file though, so virglrenderer can mmap seamlessly.

@DUOLabs333
Copy link

I think I got it:
I can use funopen to make a psuedo file, which can implement the descriptor's operations transparently when being mmaped.

@DUOLabs333
Copy link

This is weird though --- the code path which leads to the error (which notably, I never reached before, which explains why I never gotten this error before), specifically wants shmem. If this was a problem with QEMU, why hasn't this been caught before?

@DUOLabs333
Copy link

The problem exists even if you make a temporary tmpfile, instead of a shmem.

@DUOLabs333
Copy link

Ok, I made a first version of using DMA instead of shmems, but I'm stuck at including the path from Kernel.framework.

If I include <Kernel/IOKit/IODMACommand.h>, then compilation fails, because that file includes <IOKit/IOCommand.h>. The problem is that MacOS looks for IOCommand in IOKit, where it doesn't exist. However, it does exist under Kernel/IOKit/IOCommand.h

@DUOLabs333
Copy link

Apparently, I had to clean out my virglrenderer build folder before the -I option took hold. However, I immediately ran into another blocker. IODMACommand is only for C++, but qemu is written in C. We would have to include to compile a wrapper.

@DUOLabs333
Copy link

I've written the wrapper, but I've gotten some errors around APPLE_KEXT_OVERRIDE. This might mean that we would have to make a kext for UTM/qemu, which might not be desirable.

@DUOLabs333
Copy link

Ok, I rewrote it to use DriverKit, but now IOBufferMemoryDescriptor::Create fails with kIOReturnNotReady, which is a weird message to get (I thought I would have gotten something about permissions). I added the com.apple.developer.driverkit entitlement, just to be safe.

@DUOLabs333
Copy link

Ok, since DriverKit seems to REQUIRE a driver to use any of its functions (or at least some special setup), I rewrote the DMA code once again to use IOSurface.

However, now I realize that fileno can't create file descriptors for file pointers. So, is there a way to get file descriptor to either void pointers or IOSurfaces?

@DUOLabs333
Copy link

DUOLabs333 commented Aug 12, 2023

I got mesa working with IOSurface, but I still get the assertion error. Is there something else I'm not doing?

@upintheairsheep
Copy link

A full implementation of DirectX 12 to 9 is present via Apple Game Porting Toolkit

https://www.reddit.com/r/macgaming/comments/142tomx/apples_game_porting_toolkit_seems_to_have_a/

@tifasoftware
Copy link

tifasoftware commented Aug 22, 2023 via email

@baryluk
Copy link

baryluk commented Nov 17, 2023

I am new to mac (was using Linux for over 20 years), but got Debian Linux running on UTM, and works nicely.

I am also interested in Venus.

Another benefit of venus over virgl, would be better handling of multiple OpenGL apps (when running Zink on top of venus to provide GL) in the guest. With virgl they are all funneled to a single host side OpenGL context. This has issues with buffer flip / sync, and due to OpenGL highly synchronous nature (at least in virgl world), causes stutters when one has two OpenGL apps open (i.e. glxgears + benchmark) - I have seen this with Linux guest on Linux host. With venus, each open of a device instance on a host, is mapped to open on the host, and all contexts are separate as they would be natively, and no more stutter.

Also, Zink implements OpenGL 4.6 on suitably modern Vulkan driver (I do not know if Zink works on MoltenVK as of the moment, but there was a bit of work on this in the past - but get blocked mostly due to Mesa requiring some features that are simply not in the macos. If Zink runs on the guest, and MoltenVK on the host, then this should not be a problem tho).

Of course dxvk and others should work too (with suitable work on the guest side, for things like page size differences).

@DUOLabs333
Copy link

QEMU 8.2.0 was just released, with the Android Emulator's rutabaga merged. rutabaga supports Vulkan; however, from what I can tell, MacOS support hasn't been fully finalized (it's likely that it will come eventually).

I am working on another approach that doesn't require changes to QEMU --- the tradeoff is that it is slower (how much slower is to be seen).

@DragonSWDev
Copy link

gfxstream was merged into Mesa for Vulkan virtualization. Does this improve the UTM situation in any way?

@DUOLabs333
Copy link

@DragonSWDev gfxstream has already been merged into QEMU for a while. However, gfxstream does not currently support macOS.

@DragonSWDev
Copy link

@DUOLabs333 Isn't that used in Android Emulator that supports macOS?

@DUOLabs333
Copy link

DUOLabs333 commented Sep 20, 2024

@DragonSWDev It is, but I'm not sure whether that is a separate build or is a part of their open source release. I've asked a question of this effect on the MR, hopefully I get a response. If it is officially supported, I guess I can drop my project and see if I can build gfxstream (however, it'll still take a while for distributions to build the new release, which has the new Mesa driver required to use gfxstream on the guest).

@DUOLabs333
Copy link

Looking through the source code, gfxstream definitely has macOS support, but I'm not sure the gfxstream integration in QEMU does.

@DUOLabs333
Copy link

I've successfully been able to compile QEMU with rutabaga_gfx enabled. However, actually starting QEMU fails --- I'll have to look into this.

@oliverbestmann
Copy link

Did you get anywhere? @DUOLabs333

@DUOLabs333
Copy link

@oliverbestmann No, I haven't had the time. However, I will note that building all of the necessary components from source takes quite a lot of space (~3-4 GB), so I'm limited to Nix releases, which likely does not have improvements/bugfixes that may have landed the master branch since the last release. Also, while developing my own driver, I've noticed issues with MoltenVK that I've had to work around, but it's possible that gfxstream does not do the same. Therefore, it's possible that there's no clear benefit to be gained by relying on gfxstream.

@DUOLabs333
Copy link

DUOLabs333 commented Dec 11, 2024

It seems that QEMU recently merged in venus support (so no more building custom kernels or QEMU forks). It's available in 9.2.0.

@dboyan
Copy link

dboyan commented Jan 15, 2025

Just wanna share some info for reference if someone wants to work or is working on this. It would be exciting to see vulkan acceleration to work on macos host, but I can see that there are a few challenges or caveats on the way.

I guess the greatest challenge is that Venus requires the ability to share graphics memory between qemu and the renderer process. On linux, it is done via dma_buf file descriptors. Apparently we don't have such thing on macos. It is not impossible to do, but probably needs some considerable work on virglrenderer (and possibly qemu). One possibility is to share host memory across processes in the macos way and play with VK_EXT_external_memory_host to properly wire things up. If this is addressed, the rest is probably easier.

Meanwhile, the robustness of MoltenVK at the current stage is another question. I tried to hook up virgl+angle with the latter's vulkan backend and I discovered right away that the desktop cannot render correctly due to a bug in mvk. I managed to fix that bug and now the desktop renders and basic apps work. But certainly, there should be quite a few other issues around. Although I guess is probably okay to provide best-effort vulkan support at the first stage (if someone takes this up), and still rely on proper gl implementation (like angle) to render the vm desktop. If moltenvk (or any vulkan impl on macos) grows to production quality, using zink for gl in vm will probably be in sight.

@osy
Copy link
Contributor Author

osy commented Jan 15, 2025

@dboyan do you have a repository with your progress?

@dboyan
Copy link

dboyan commented Jan 16, 2025

@dboyan do you have a repository with your progress?

Not with venus yet. Currently I'm toying around with virgl+angle+moltenvk trying to obtain a bit more understanding about where things stand. That itself only requires a few lines of change in qemu. I currently don't have plan to work on venus due to my limited spare time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

10 participants