Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[browser][MT] WebGL with threads - requestAnimationFrame #101421

Open
pavelsavara opened this issue Apr 23, 2024 · 28 comments
Open

[browser][MT] WebGL with threads - requestAnimationFrame #101421

pavelsavara opened this issue Apr 23, 2024 · 28 comments
Assignees
Milestone

Comments

@pavelsavara
Copy link
Member

pavelsavara commented Apr 23, 2024

How we could make webGL (and other native scenarios) possible with MT ?

I believe that the lack of access to the "true" UI thread would completely break WebGL interop.
WebGL doesn't have a dedicated present/swapBuffers call and instead just presents render results once the user code exits the current event callback. If JS API calls are getting enqueued from a worker thread, that would mean that webgl content will be "presented" on every single call, thus making any kind of rendering impossible.

Another problem would be WebGL usage with SkiaSharp: Skia uses WebGL internally and expects those calls to happen on the "true" UI thread rather than on a web worker.

From #85592 (comment)

@pavelsavara pavelsavara added this to the 9.0.0 milestone Apr 23, 2024
@pavelsavara pavelsavara self-assigned this Apr 23, 2024
@pavelsavara
Copy link
Member Author

Few ideas for start

  • The JSImport calls to JS could be synchronous, but not re-entrant.
    • Do you need synchronous callbacks, when/why/how ?
  • Note that UI thread is and will stay native/emscripten thread.
    • some of sys calls and VFS POSIX calls are synchronously proxied to the UI thread.
    • Perhaps we could make webGL native calls also proxied ?
    • Search proxiedFunctionTable in emscripten codebase.

@pavelsavara
Copy link
Member Author

cc @kekekeks @maxkatz6

@kekekeks
Copy link

kekekeks commented Apr 23, 2024

In general WebGL calls should happen inside of the callback from window.requestAnimationFrame on the browser UI thread. That particular callback timing is required to match the screen update rate.
All WebGL calls related to a single frame should happen inside of a single browser event loop iteration (i. e. inside of said
window.requestAnimationFrame callback).

So the typical scenario would be:

  1. JS code from window.requestAnimationFrame sets up rendering
  2. JS code synchronously calls .NET code
  3. .NET code synchronously calls WebGL functions either via JSImport or emscripten P/Invokes or Skia code that uses emscripten APIs internally. All of those should happen inside of the same window.requestAnimationFrame callback
  4. frame rendering is finished
  5. .NET code exits and returns control back to JS
  6. JS returns from window.requestAnimationFrame callback
  7. Browser assumes that we are done rendering and queues render results for presentation

So WebGL rendering requires synchronous JS->.NET->JS calls directly on the main browser thread.

Note that it's not possible to prepare everything as a "call list", since some WebGL calls can read data back from GPU in synchronous manner and we can't interrupt the call sequence by exiting the event callback since the browser would assume that we are done rendering the frame.

@kekekeks
Copy link

kekekeks commented Apr 23, 2024

Note that we don't really need lock()/.Wait on the browser thread (there are currently some short-lived locks by those can be replaced with lock-free code), just a way to call into .NET code and call JS back.

@kekekeks
Copy link

Actually, I haven't considered locks inside of SkiaSharp. It utilizes locks extensively for its object handle tracking. Those are rather short lived (and won't even be contended in most apps), so spinlocks should be fine.

@pavelsavara
Copy link
Member Author

WebGL calls should happen inside of the callback from window.requestAnimationFrame on the browser UI thread.
3. .NET code synchronously calls WebGL functions

This is nasty nested synchronous callback.
The the UI thread is spin-waiting for the semaphore from the first call to resolve/return, when you want to deliver the message about the nested call.

The syscalls in the proxiedFunctionTable are exceptions from this rule and will get executed inside of the UI spin-lock.
It's in the middle of unrelated business logic, which is on stack, it could arrive from any other thread.
It's processed kind of "out of order" WRT the current synchronous call.
This is how emscripten VFS works, not pretty, but it "works".

Note that we don't really need lock()/.Wait on the browser thread.

You can try it with

        dotnet.withConfig({
            jsThreadBlockingMode: "ThrowWhenBlockingWait",
        });

or with "DangerousAllowBlockingWait"

which will allow you to make (un-nested) synchronous JSExport calls.

My lessons learned from working on this for last 12 months is that you never know.

  • you could be lucky and win the race for the lock
  • C# spin-wait for while and then win the race for the lock
  • actually engage in real wait
    • we will throw PNSE when ThrowWhenBlockingWait, but it's far from 100% covered.
    • this is to let you learn that you are blocking the managed thread and the UI thread as well
    • chance you see this in test is low
      • one of many of CI runs of unit tests on runtime repo, there are many tests

when UI thread is blocked, event loop is blocked, UI doesn't render, postMessage doesn't work, debugger sucks, new WebWorkers are impossible to spawn.
when deputy thread is blocked it can't receive emscripten messages from other threads.

If there is wait/promise chain this could be deadlock.
How do you know that you are not blocking waiting for HTTP stream in 3rd party code you don't own.
That HTTP promise would never resolve.

All that said, this is not to say "I give up" 😉

Does it have to be managed code, whatever is talking to webGL from requestAnimationFrame callback ?
Could all what's needed be pre-computed on background thread and applied in JS or C ?

Blazor people have renderBatch implemented in just JavaScript. The current problem with that is that they are reading the "diff" directly from managed memory. They also have server side diff message that is applied to DOM. They will have to adopt "diff message" rather than "C# memory scan" to make Blazor MT compatible. More about it here

@pavelsavara
Copy link
Member Author

pavelsavara commented Apr 23, 2024

Also would this help ?
https://github.com/kripken/webgl-worker

Or this
https://developer.mozilla.org/en-US/docs/Web/API/HTMLCanvasElement/transferControlToOffscreen

We have draft of JSWebWorker API, which allows your C# to execute managed code on new WebWorker and JS interop with state of that WebWorker. Again this is not public API yet and also has the problem with blocking .Wait vs event loop and resolving promises.

@kekekeks
Copy link

kekekeks commented Apr 23, 2024

The the UI thread is spin-waiting for the semaphore from the first call to resolve/return, when you want to deliver the message about the nested call.

Why does the runtime need to marshal the call to a web-worker in the first place? Can't it just execute the code directly on the UI thread? The requirement for WebGL rendering is to specifically run everything rendering-related on the main thread inside of the requestAnimationFrame callback and to take minimum possible time, since the code has to run 240 times per second with 240Hz monitors and be synchronized to VSync.

when UI thread is blocked, event loop is blocked, UI doesn't render, postMessage doesn't work, debugger sucks, new WebWorkers are impossible to spawn.

We only need the UI thread to run our rendering code. It doesn't do any IO nor wait on any long-running semaphores (i. e. spin-waits are fine).

The need to run all of the rendering code directly inside of requestAnimationFrame is a requirement from browsers, not from our architecture.

Could all what's needed be pre-computed on background thread and applied in JS or C ?

Unfortunately, not really, some of WebGL calls are not fire-and-forget but require actual handling of returned values or are supposed to block on reading back from GPU (i. e. if we want to copy some computed data from GPU to CPU mid-frame). Also, we aren't doing most of the calls by ourselves, the usual chain is SkiaSharp > Skia (native) -> emscripten opengl layer -> JS, so even if we don't need readback, we can't even replay the commands on the UI thread without executing managed code there

https://github.com/kripken/webgl-worker

It doesn't allow a huge part of OpenGL calls to work. I. e. it doesn't support reading back from GPU memory.

https://developer.mozilla.org/en-US/docs/Web/API/HTMLCanvasElement/transferControlToOffscreen

IIRC OffscreenCanvas doesn't support WebGL contexts on some browsers (last time we've checked it didn't work with Safari), only 2d drawing.

If it does work now, the requirements would be mostly the same: we'll need to call into .NET from JS running in a web worker and have callbacks executed in the same worker and without costly cross-worker marshalling, since there are potentially thousands of calls per-frame.

Again this is not public API yet and also has the problem with blocking .Wait vs event loop and resolving promises.

The worker's event loop shouldn't be occupied with anything but requestAnimationFrame and gl context loss/restore callbacks, so it should be fine to have blocking waits there as long as those don't consume CPU time.

@pavelsavara
Copy link
Member Author

pavelsavara commented Apr 23, 2024

Why does the runtime need to marshal the call to a web-worker in the first place?

Because we are not willing support those deadlock scenarios.
More explanations in the design doc, the problems section.

During the development we made many prototypes trying to make it possible. It randomly deadlocks like 30% of our unit tests.

If it does work now, the requirements would be mostly the same: we'll need to call into .NET from JS running in a web worker and have callbacks executed in the same worker

That's what the JSWebWorker does, there is no messaging there, same thread executes JS and managed code. And also can really blocking wait without spinning.

@pavelsavara
Copy link
Member Author

IIRC OffscreenCanvas doesn't support WebGL contexts on some browsers (last time we've checked it didn't work with Safari), only 2d drawing.

I don't have any experience with it, but they say "Safari 16.4 (Released 2023-03-27)"
https://caniuse.com/?search=transferControlToOffscreen

@pavelsavara pavelsavara changed the title [browser][MT] WebGL and other native scenarios [browser][MT] WebGL with threads Apr 23, 2024
@kekekeks
Copy link

Seems to work with macOS 14 and iOS 17.4.

https://bugs.webkit.org/show_bug.cgi?id=183720#c18 states that WebGL is supported for OffscreenCanvas on Sonoma+ (released Sep 2023) and iOS 17+ and isn't supported with Ventura regardless of Safari version.

By the time .NET 9 is out, those should be widespread enough to justify them as a requirement for MT support, I guess.

@pavelsavara
Copy link
Member Author

I suggest you start with "demo" project, rather than the actual codebase.
Just MT dotnet and webGL demo. I'm willing to look over your shoulder and assist with such demo.

Perhaps we should resurrect ST or MT raytracer demo and bring it to Net9

@kekekeks
Copy link

kekekeks commented Apr 23, 2024

Is there some doc on using JSWebWorker API from an app when building with 9.0.100-preview.3.24204.13 SDK?

@kekekeks
Copy link

Ah, the source suggests to just use reflection, gotcha.

@pavelsavara
Copy link
Member Author

Also you can use nightly build. There are still good changes since last preview.

@kekekeks
Copy link

We'll need to transfer the OffscreenCanvas object created on the UI thread to the worker via postMessage.
Is there a way to get the Worker JS object for a freshly started JSWebWorker instance? Or some API to do such transfer using managed APIs? Run methods just return a Task.

@kekekeks
Copy link

It seems that people are using PThread.pthreads[id].worker from emscripten, but I'm not sure if that's the correct way of accessing the worker object.

@pavelsavara
Copy link
Member Author

It seems that people are using PThread.pthreads[id].worker from emscripten, but I'm not sure if that's the correct way of accessing the worker object.

Good enough for the demo, but not long term. This is quite low level (not an API from our perspective).

We also have emscripten C message queue. And also have postMessage channels between UI and the worker in the dotnet.runtime.js. We consider those internal implementation detail.

I prefer that the handshake is initiated from the worker side and that users don't touch emscripten PThread.pthreads

You can await JSHost.ImportAsync() JS script into the worker inside the JSWebWorker C# callback.
self.getDotnetRuntime(0) is a way how you get the JS API in there. That will allow you to bind JSExport and JSImports in the worker.

That will allow you to self.postMessage to the UI thread.
But the handler is emscripten/runtime and we do not expose a way how to add handler on the UI side via our JS API.

I'm thinking on how to design it in a clean way.

@kekekeks
Copy link

I prefer that the handshake is initiated from the worker side and that users don't touch emscripten PThread.pthreads

Yes, but we need to transfer a transferable JS object from UI thread to the worker thread and that requires it to be passed in the second array argument of postMessage. That particular postMessage call has to happen on the Worker instance on the UI thread.

@pavelsavara
Copy link
Member Author

Or dedicated channel which could be somehow located in the JS of the UI thread

@kekekeks
Copy link

[MONO] JSWebWorker was disposed while running, ManagedThreadId: 3.
 at System.Environment.get_StackTrace() at System.Runtime.InteropServices.JavaScript.JSWebWorker.JSWebWorkerInstance`1[[System.Int32, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].Dispose(Boolean disposing) 
at System.Runtime.InteropServices.JavaScript.JSWebWorker.JSWebWorkerInstance`1[[System.Int32, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].Finalize()

@kekekeks
Copy link

I had to use this because of that error:

    public static class JSWebWorkerClone
    {
        private static readonly MethodInfo _setExtLoop;
        private static readonly MethodInfo _intallInterop;

        [DynamicDependency(DynamicallyAccessedMemberTypes.All, "System.Runtime.InteropServices.JavaScript.JSSynchronizationContext", 
            "System.Runtime.InteropServices.JavaScript")]
        [DynamicDependency(DynamicallyAccessedMemberTypes.All, "System.Runtime.InteropServices.JavaScript.JSHostImplementation", 
            "System.Runtime.InteropServices.JavaScript")]
        [UnconditionalSuppressMessage("Trimming", 
            "IL2026:Members annotated with 'RequiresUnreferencedCodeAttribute' require dynamic access otherwise can break functionality when trimming application code",
            Justification = "Private runtime API")]
        static JSWebWorkerClone()
        {
#pragma warning disable IL2075
            var syncContext = typeof(System.Runtime.InteropServices.JavaScript.JSHost)
                .Assembly!.GetType("System.Runtime.InteropServices.JavaScript.JSSynchronizationContext")!;
            var hostImpl = typeof(System.Runtime.InteropServices.JavaScript.JSHost)
                .Assembly!.GetType("System.Runtime.InteropServices.JavaScript.JSHostImplementation")!;
            
            _setExtLoop = hostImpl.GetMethod("SetHasExternalEventLoop")!;
            _intallInterop = syncContext.GetMethod("InstallWebWorkerInterop")!;
#pragma warning restore IL2075
        }

        public static Task RunAsync(Func<Task> run)
        {
            var tcs = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously);
            var th = new Thread(_ =>
            {
                _intallInterop.Invoke(null, [false, CancellationToken.None]);
                try
                {
                    run().ContinueWith(t =>
                    {
                        if (t.IsFaulted)
                            tcs.TrySetException(t.Exception);
                        else if (t.IsCanceled)
                            tcs.TrySetCanceled();
                        else
                            tcs.TrySetResult();
                    });
                }
                catch(Exception e)
                {
                    tcs.TrySetException(e);
                }
            })
            {
                Name = "Manual JS worker"
            };
            _setExtLoop.Invoke(null, [th]);
            th.Start();
            return tcs.Task;
        }
        
    }

Also had to replace your onmessage handler in the worker since it was freaking out on unknown command (EMScripten doesn't do it, BTW) and used pthread_self + PThread.pthreads[] combo.

A PoC seems to work.

@kekekeks
Copy link

Missing APIs:

  1. JSWorker.postMessage with transfer array (need to transfer OffscreenCanvas, it's not clonable), that is somehow exposed directly to JS.
  2. OnMessage method on the worker that doesn't conflict with runtime-installed onmessage

@pavelsavara
Copy link
Member Author

This demo is now MT on Net9 preview 3 https://pavelsavara.github.io/dotnet-wasm-raytracer/
It doesn't really do any WebGL tho

@pavelsavara pavelsavara modified the milestones: 9.0.0, 10.0.0 Jun 28, 2024
@pavelsavara
Copy link
Member Author

note to self: sending message to managed code and spin-waiting inside requestAnimationFrame would be OK, but we also need to dispatch synchronous calls back from managed to UI thread, while UI is still waiting for "done" message.

I don't really understand if emscripten's GL emulator could run on a worker and if it would proxy syscalls back to UI thread.

Perhaps some tweaks to emscripten's message queue would do it ?
There is already queue which is able to run callbacks inside of other unrelated waits.

cc @jeromelaban

@pavelsavara
Copy link
Member Author

pavelsavara commented Sep 30, 2024

The other idea I have is to move all the emscripten's OS/syscalls emulator to dedicated web worker. That is to move VFS (or use WASMFS) and move thread spawning code and other syscalls.

This could possibly free-up the UI thread to be able to run managed code (again). It would block render and spin-wait on anything cross-thread or on GC.

See
https://github.com/emscripten-core/emscripten/blob/0f13010ecf790c3d08c833167c863731ddb42ed6/system/lib/pthread/emscripten_yield.c#L44-L46

https://github.com/emscripten-core/emscripten/blob/0f13010ecf790c3d08c833167c863731ddb42ed6/system/lib/pthread/proxying.c#L38-L40

@kekekeks
Copy link

I don't really understand if emscripten's GL emulator could run on a worker and if it would proxy syscalls back to UI thread

The problem is not emscipten, but the way WebGL works in general:

  1. once you return from a callback, the browser assumes that you have finished rendering the frame and presents it, so you have to make all of your calls inside of a single callback from the main browser event loop
  2. some OpenGL calls are blocking and actually return values, so you can't just record OpenGL commands for the entire frame and execute those as a batch

So any complex OpenGL code that wants to use a UI-thread-bound WebGL context has to run on the UI thread too.

In Avalonia we currently spawn a dedicated web-worker using a hacked JSWebWorker version, transfer a canvas to said webworker and use a worker-bound WebGL context. requestAnimationFrame is also called in this dedicated worker.

That approach prevents us from using several advanced features like using <video> element as a texture, but works fine otherwise. I also believe that a dedicated render worker is the better approach anyway since it allows us to push frames when the UI thread is otherwise blocked by something.

BTW, Avalonia is currently broken with the latest .NET 9 RC because you've bumped emscripten and SkiaSharp wasn't yet updated to adjust for that change, so you can't play with it, sorry.

Note that aside from OpenGL there is a similar problem with event handlers. Some of browser events kinda expect you to answer synchronously (e. g. if you want to run some logic in keyDown event and decide if it should be marked as handled or not), so enforced async isn't always feasible in general. Some of JS->managed callbacks just need to be synchronous and need to support JS->managed->JS roundtrip, one just needs to be really careful to avoid those whenever possible.

@maxkatz6
Copy link
Contributor

maxkatz6 commented Sep 30, 2024

Some of browser events kinda expect you to answer synchronously (e. g. if you want to run some logic in keyDown event and decide if it should be marked as handled or not), so enforced async isn't always feasible in general.

I would expect this issue to affect Blazor WASM applications too.

SkiaSharp wasn't yet updated to adjust for that change

New SkiaSharp nightly was released with new emscripten builds. We need to adjust NativeFileReference to look for a newer version though.

@pavelsavara pavelsavara changed the title [browser][MT] WebGL with threads [browser][MT] WebGL with threads - requestAnimationFrame Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants