Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rendering artifact when using ANGLE's vulkan backend with MoltenVK #2418

Closed
dboyan opened this issue Jan 8, 2025 · 1 comment · Fixed by #2419
Closed

Rendering artifact when using ANGLE's vulkan backend with MoltenVK #2418

dboyan opened this issue Jan 8, 2025 · 1 comment · Fixed by #2419

Comments

@dboyan
Copy link
Contributor

dboyan commented Jan 8, 2025

I'm using qemu+virgl for 3d-accelerated linux vm with akihikodaki's patches (details available here). The host-side virgl renderer is backed by ANGLE (which itself can use different backends). I recently tried to play with different ANGLE backends but found if I use the vulkan backend it will result in heavy rendering artifact with moltenvk (while it renders fine with ANGEL's default OpenGL backend). Here's some more details.

Environment

Issue description

First, to provide a little bit more background, the patched qemu on macos provides 3d acceleration to guest in the following way:

Guest GL --(virgl)-> Host GLES on EGL --(ANGLE)-> Backends (OpenGL / Metal / Vulkan)

When I choose ANGLE's backend to vulkan (using configuration when initializing EGL), the rendering will contain heavy artifact when the VM boots into operating system. For example, here is a screenshot at the gdm login screen:

gdm-artifact

while a correct rendering should look like the following:

gdm-correct

Preliminary analysis

I'm aware that a chain with multiple steps is involved in the rendering process and mistake in any step can make result incorrect. Specifically, there are the following possibilities:

  1. virgl generates incorrect gles commands on host (unlikely, as discussed below)
  2. ANGLE's vulkan backend translates gles input incorrectly
  3. mvk translates vulkan to metal incorrectly

I'm not 100% sure for now but I'm sharing my data to provide some clarification and they may serve as pointers to further diagnose the issue.

I used two capture tools to record the command stream at different stages of the chain. First, I recorded the OpenGL ES command stream at the input of ANGLE (with a heavily adapted version of apitrace). Meanwhile, I used gfxreconstruct to record the vulkan command stream generated by ANGLE. I attached both files recorded from a single vm launch at the same time:

Actually the first trace can be correctly rendered when replayed directly on Linux. In fact, the second screenshot above is taken from there. This means that step 1 (virgl) should be most likely correct. However, the vulkan trace cannot help to a equal degree because it is apparently not portable across platforms (and the desktop doesn't even render when replayed on macos, not very sure if the issue is with recording or rendering replay). So now I cannot tell if this is ANGLE's or mvk's fault.

Further I can try to build ANGLE with vulkan on linux and try to see the result from native vulkan rendering from the same gles input, but I cannot promise to get it done very soon. But please let me know if I can provide any clarification or other necessary information.

@dboyan
Copy link
Contributor Author

dboyan commented Jan 11, 2025

I'm able to get some details which might shed some lights on the issue. Now I'm able to narrow down a small set of rendering commands to reproduce the issue. Here is a trimmed apitrace file to reproduce the issue, the rendering commands are in the second frame if viewed using qapitrace.

The incorrect rendering can happen when composing a series of textures on a larger texture, which I guess is the basics of 2d acceleration. The each iteration of rendering calls in OpenGL ES looks like the following:
render-calls

For each iteration, only the texture (used by the sampler) and beginning vertex index (of glDrawArrayInstanced) are changed, both of which are circled in the image above. The coordinates of the textures to be composed are passed into the vertex shader from vertex attributes arrays.

It turns out that on (ANGLE over) MoltenVK, some textures are composed at incorrect locations in a series of draw calls, seemingly occupying the place intended for other textures. I captured the intermediate states of one texture composing process using apitrace tool to read the offscreen texture after the exact API call. For each step, I list the result of the composed image rendered from:

  • ANGLE over vulkan (nvidia gpu on linux), I obtain identical result when replaying the trace on native gles on linux
  • ANGLE over MoltenVK

For easier viewing, I applied black background and vertically flipped from each resulting texture:

Step 1 (call 30306):
color-30306-correct
color-30306-mvk

Step 2 (call 30328): a "black" texture is rendered in this step, effect is invisible
color-30328-correct
color-30328-mvk

Step 3 (call 30350): The result start to diverge, it seems the "black" texture is misplaced to cover the whole area in MoltenVK
color-30350-correct
color-30350-mvk

Step 4 (call 30372):
color-30372-correct
color-30372-mvk

Step 5 (call 30394):
color-30394-correct
color-30394-mvk

Step 6 (call 30416):
color-30416-correct
color-30416-mvk

I'm guessing that the vertex indices and textures for multiple draw calls somehow becomes mismatched when translated into metal. I might be able to create some minimal gles or vulkan sample code to reproduce problem. Meanwhile, please let me know if there is more information I can share.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant