Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asyncify: Improve developer experience #251

Closed
adamziel opened this issue May 4, 2023 · 9 comments
Closed

Asyncify: Improve developer experience #251

adamziel opened this issue May 4, 2023 · 9 comments

Comments

@adamziel
Copy link
Collaborator

adamziel commented May 4, 2023

The problem

PHP.wasm crashes if the ASYNCIFY_ONLY functions list in packages/php-wasm/compile/Dockerfile is not exhaustive enough:

236297634-dfde050a-32a3-41e8-9792-71c96af1c2cc

Whenever it crashes, the developer must analyze the stacktrace, update the list, and rebuild PHP. That DX is rough and many developers won't be willing to approach Asyncify issues.

Tl;dr solution idea

Autogenerate the list of ASYNCIFY_ONLY functions with an automated test suite that would exercise all possible code paths that may trigger a network call in PHP.


The full story:

Technical details

Asyncify lets synchronous C or C++ code interact with asynchronous JavaScript. Technically, it saves the entire C call stack before yielding control back to JavaScript, and then restores it when the asynchronous call is finished. This is called stack switching.

Stack switching requires wrapping all C functions that may be found at a call stack at a time of making an asynchronous call. Blanket-wrapping of every single C function adds a significant overhead, which is why we maintain a list of specific function names:

export ASYNCIFY_ONLY=$'"ZEND_DO_FCALL_BY_NAME_SPEC_HANDLER",\
"ZEND_DO_FCALL_BY_NAME_SPEC_OBSERVER_HANDLER",\
"ZEND_DO_FCALL_BY_NAME_SPEC_RETVAL_UNUSED_HANDLER",\
"ZEND_DO_FCALL_BY_NAME_SPEC_RETVAL_USED_HANDLER",\
"ZEND_DO_FCALL_SPEC_CONST_HANDLER",\
"ZEND_DO_FCALL_SPEC_HANDLER",\
"ZEND_DO_FCALL_SPEC_OBSERVER_HANDLER",\
"ZEND_DO_FCALL_SPEC_RETVAL_UNUSED_HANDLER",\
"ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER",\

Unfortunately, missing even a single item from that list results in a WebAssembly crash whenever that function is a part of the call stack when an asynchronous call is made.

@danielbachhuber asked:

It seems like we need to find a more permanent solution to these function crashes. I don't think we'll be able to tag a "stable" release with this behavior.

Is it possible to reliably list all the relevant functions by hand?

I think so.

The current list is a result of a long trial-and-error process, but there is a away to systematize it, or perhaps even automate it.

Source of the crash

The only async operations are network requests – we can work backwards from here to find all the relevant Zend engine functions.

The only way to trigger an async request is via Emscripten's createPeer function which is triggered by connect(2), listen(2) and sendmsg(2).

The internal network functions are called by the following top-of-the-stack code paths:

  • PHP stream handlers like getimagesize("https://...")
  • PHP network-related functions like fsockopen or mysql_connect.

Top-of-the-stack functions can be called by the following bottom-of-the-stack code paths:

  • A function call (myfunc(), $obj->method(), $reflection->callMethod(), new MyClas(), etc.)
  • Resource cleanup (destructors, request shutdown handlers, etc.)

The big idea

  • Find all top-of-the-stack functions that trigger network calls
  • Find all bottom-of-the-stack functions that trigger the above
  • Write a test suite for a cartesian product of the two
  • Auto-append any missing items to ASYNCIFY_ONLY
  • Rebuild, rinse and repeat

This way, we'd get a more reliable and systematic list of ASYNCIFY_ONLY functions.

There's still a risk of missing some code paths, but:

  • It should happen much less frequently
  • The fix would only involve creating a unit test that reproduces the crash
  • We could involve the output of ASYNCIFY_ADVISE in the process to minimize the risks

Furthermore, perhaps the error message could include a link to a new issue form with the stack trace and other relevant details already prepopulated.

Why other ideas won't work so well and what we can do about it

Auto-listing all the functions

Asyncify can auto-list all the required C functions when built without ASYNCIFY_ONLY.

Auto-detection is overeager and ends up listing about 70,000 C functions which increases the startup time to 4.5s. This can be seen by adding -sASYNCIFY_ADVISE to the final emcc in the Dockerfile. Edit: I just tried this again and seems to be significantly less now. There's still a lot of unrelated functions there, though. Dynamic calls issued by internal functions like zend_call_method don't play well with static analysis.

Eventually, V8 will likely handle stack switching for us and remove this problem entirely. Unfortunately, the timeline for landing this in Node.js is unclear. @ThomasTheDane – do you have any insights about that, by any chance?

Recovering from a crash

We cannot recover. A crash leaves PHP in an undefined state. Were we in a middle of a request? In a request shutdown handler? Do we need to clean up any dangling resources? There's no telling! Once a crash happened, we must shutdown the PHP runtime.

We can, however, automatically open a new runtime. It will crash again eventually, but at least the developer won't have to manually restart the app.

Automating recompilation

@danielbachhuber asked:

Could we abstract the ASYNCIFY functions to some configuration file, and then have this error message code do a smarter job identifying the missing function?

The answer is yes, but only for developers working with node.js packages inside WordPress Playground repo. No bueno for npm package consumers and on the web. This could still be useful.

cc @sejas @dmsnell @jsnajdr

@adamziel
Copy link
Collaborator Author

adamziel commented May 5, 2023

Another idea: don’t do stack switching.

Right now we can call WASM function b while the call stack from function a waits to be resumed. PHP relies heavily on a global state stored im a heap so we will never exercise this possibility.

What if, instead of saving and restoring the call stack, we would always resume the last execution? Would it remove the need for the allowlist?

The answer should be somewhere here:

Edit:

It seems like we can't skip stack switching. Asyncify seems to be "yielding back to JS" by transforming the compiled code in this fashion:

function asyncFunction() {
  let x;
  if (rewinding) { /* .. restore x .. */ }
  if (normalExecution) {
    x = anotherAsyncFunction();
  }
  if (waitingForJs) {
    // .. save call index and x ..
    return;
  }
  // .. further processing ..
  return x;
}

The execution flow resembles this:

asyncFunction();
if ( waitingForJs ) {
    // Everything after the async call short-circuited
    // Let's await the async task now
    await jsTask;

    // Let's call the same function again, but set the
    // global context in such a way to resume after the
    // asynchronous part:
    waitingForJs = false;
    rewinding = true;
    asyncFunction();
}

Asyncify must be aware of all the functions in the stack to add these conditions. Otherwise it couldn't short-circuit when one of the callees issues an async call. The ability to call another WASM function while waiting for async information is just a nice side-effect of how this process works.

@ThomasTheDane
Copy link

ThomasTheDane commented May 5, 2023 via email

@adamziel
Copy link
Collaborator Author

adamziel commented May 5, 2023

Thank you so much @ThomasTheDane! Email pings unfortunately didn't transfer to GitHub so I looked up the usernames of all three folks you've mentioned – I hope I got it right! CC @fgmccabe @ajklein @dtig

@adamziel
Copy link
Collaborator Author

adamziel commented May 5, 2023

Attempt at autogenerating the ASYNCIFY_ONLY list based on a unit test suite: #253

@fgmccabe
Copy link

fgmccabe commented May 5, 2023

At the risk of not fully understanding what you are trying to do ...

JSPI represents a different approach to ASYNCIFY, even though we use the emscripten flag ASYNCIFY=2 to enable it.

In essence, I think it is likely that JSPI addresses your concerns more directly.

With a JSPI'd (is that a verb?) WebAssembly module, when you call a marked export, that call is executed on a separate stack to the original. If the the call needs to be suspended, then the stack is simply put aside, and when the fetch request (say) is resolved, the 'put away' stack is reentered.

It is obviously more complex than this picture, but the essence is that the execution stack is not unwound/rewound during suspensions and resumptions. It also means that the only things that the developer has to pay attention to are the exports and imports involved.

@adamziel
Copy link
Collaborator Author

adamziel commented May 5, 2023

@fgmccabe That's helpful, thank you! And it covers the essence of what I'm trying to do here, which is to make async calls without risking exceptions that can only be fixed by rebuilding the WebAssembly module.

I'd love to switch to JSPI entirely, however I understand it's only available in Chrome as of today. I understand the timeline for adoption in other browsers may be hard to reason about, but are there plans to bring JSPI to Node.js in the nearest future? It would solve this entire issue since only the Node.js version of WebAssembly PHP makes async calls at the moment.

@fgmccabe
Copy link

fgmccabe commented May 5, 2023

The current implementation in V8 is essentially 'experimental status'. We have arm64 and x64 implementations.
The next steps are to implement on 32 bit arm/intel. That requires us to solve some issues that we did not have to solve so far.
As for node.js, my guess is that it is already in node, behind a flag.
To remove the flag requirement involves getting other implementations. The best estimate for that is towards the end of this year; but it obviously depends on resources and funding.
In addition, it would need further progress in the standardization effort; but, given that it is a 'small' spec, that should not be a long term burden.
Hope that this helps you understand the roadmap :)

@adamziel
Copy link
Collaborator Author

adamziel commented May 8, 2023

This is very helpful @fgmccabe, thank you so much! I explored using JSPI in the Node.js version of WebAssembly PHP but got stuck at one point. I think I could push through by spending more time on a minimal reproduction of the problem, but since JSPI is still experimental and only available on arm64 and x64 I'll put it on a backburner for now. I'll keep watching the JSPI project and revisit adopting it here in a few months.

@adamziel
Copy link
Collaborator Author

adamziel commented May 9, 2023

I just merged #253 which solves the core of this issue.

#253 avoids Node.js crashes, turns failures into catchable errors, ensures stack trace continuity, and makes the trace more useful by disabling minification of the JS module, introduces a unit test suite, and even automates updating the ASYNCIFY_ONLY list.

The next step would be expanding the list of Asyncify test cases. Let's close this issue and track that separately in #273.

@adamziel adamziel closed this as completed May 9, 2023
@adamziel adamziel mentioned this issue Apr 28, 2024
ironnysh added a commit that referenced this issue May 1, 2024
## What is this PR doing?

1. Reorganizing the **Contributing** chapter of the docs, and updating
the content. See [this
issue](adamziel/playground-docs-workflow#31 (comment))

2. Adding references to [the Developer
docs](https://developer.wordpress.org/) in relevant Blueprints steps.
See #1291.

~### To-Do:~

- [x] Update [wp-now](https://www.npmjs.com/package/@wp-now/wp-now)
README.

**Done** in [Update wp-now documentation
#251](WordPress/playground-tools#251).

## Testing Instructions
1. Build the site (standard `npm run build` to update the steps in
`packages/playground/blueprints/src/lib/steps/`)
2. Run `npm run dev:docs`.
3. Review the changes.

@adamziel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants