Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graceful error handling for host function calls from guest modules #89

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

clarkmcc
Copy link
Contributor

@clarkmcc clarkmcc commented Dec 3, 2024

This PR is intended to allow Go host modules to return errors to guest module function calls. In order for this to work properly, each of the PDKs also needs to be updated to expose an extism:host/env function called host_func_get_error which returns an offset to an error message, or 0 if no error was set. Host functions that return an error will save the error message within internal extism host state which is accessible by host_func_get_error.

See extism/rust-pdk#73

@clarkmcc
Copy link
Contributor Author

clarkmcc commented Dec 4, 2024

After discussing with @zshipko I've changed the approach to use the existing error_set and error_get functions. Previously, the PDKs would use error_set to pass errors from the guest to the host. This changes allows the host to call error_set in order to pass host errors to the guest.

So long as guests always check for host errors after host function calls, and hosts always check for guest errors after guest function calls (which should be happening already as I understand it), then we would never see a conflict.

My assumption is that outdated PDKs that do not support this functionality will result in this PR having no effect since they would simply not know to call error_get.

Copy link
Collaborator

@mhmd-azeez mhmd-azeez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it! I will wait for @zshipko to give it another look, thanks for the PR!

host.go Outdated Show resolved Hide resolved
extism.go Outdated Show resolved Hide resolved
host.go Outdated Show resolved Hide resolved
@clarkmcc
Copy link
Contributor Author

clarkmcc commented Dec 4, 2024

@mhmd-azeez thanks! Addressed all feedback.

@clarkmcc
Copy link
Contributor Author

clarkmcc commented Dec 4, 2024

This commit is a rough example of a possible solution to the problem discussed in office hour today -- specifically that if a guest PDK does not know to check and reset an error set by a host function, then the host will see its own error when it checks here

errMsg := p.GetErrorWithContext(ctx)

@zshipko
Copy link
Contributor

zshipko commented Dec 9, 2024

Hey thanks for digging into this!

After discussing this with the team we're a little hesitant about the potentially large blast radius of this change and unsure about using special characters in error messages to signal intent. I don't think there is a good way to handle backwards compatibility if we're changing the semantics around error messages. Something closer to the posix errno API might be a better fit for Extism.

Adding error_get to PDKs and error_set to SDKs would allow you to implement this as something more specific to your application, without pulling it into Extism. We also considered that maybe exposing vars to host functions could allow for errors or other out of band data to be sent back to the plugin, which would just be a matter of adding var_set bindings to SDKs.

Additionally, if you're open to using an XTP schema to define your plugin interfaces you could use it to allow your plugin authors to stay up to date by leveraging the code generation. This would mean that changing the plugin interface to allow error messages as part of the call output could be added to your existing API without too much disruption!

@clarkmcc
Copy link
Contributor Author

@zshipko ultimately, if this is something that extism does not want to support, regardless of the implementation, then I understand.

Current Implementation
On the off-chance that there's just a misunderstanding of the implementation, let me clarify why I believe this is a backwards-compatible and safe change.

SDK PDK Result
Older Older Safe. Current behavior.
Newer Older Safe. Older PDKs don't know how to run "error_get".
Older Newer Safe. Older SDKs don't know how to run "error_set".
Newer Newer Safe. Both SDKs and PDKs know how to negotiate and check the error state.

It's certainly possible that I don't understand all the implications of this change, and if you'd like to provide specific examples of how this implementation is not backwards compatible, I can think through those. From my initial work on this, my understanding was that this approach would be backwards compatible.

The sentinel values used for denoting the error's source are configurable via environment variables, so a PDK does not have to commit now and forever to a specific sentinel value. In other words the SDK and PDK can dynamically negotiate the sentinel prefix. An errno would be better than nothing, but due to the fact that (1) it's impossible to enumerate all possible errors a PDK might want to through, and (2) err numbers don't communicate anything about the conditions leading up to those error, this solution would not work for our use-case. We have error messages like this

Get "http://10.31.165.168": dial tcp 10.31.165.168:80: connectex: A connection attempt failed because the connected party did not properly respond after a period of time

While the errno could communicate the timeout, it could not communicate the HTTP path we were attempting to GET, or the IP address of the target we were requesting from, etc.

Vars Approach
This was actually the very first approach I took, even before opening a PR. There weren't any specific issues with this, other than it felt like a round-about way to solve a problem that, in my (admittedly biased) opinion, should be supported out-of-the-box by Extism on the basis that not every WASM use-case wants to abort execution if a host function fails.

Next Steps
As far as next steps, we're already running these PRs in production at our customers and it seems to be working well (though that cannot speak to the backwards compatibility question). If extism chooses not to solve this problem out-of-the-box, we can of course continue to maintain our forks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants