-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Recycle value stacks to avoid allocation costs #184
Conversation
It looks like @adam-rhebo hasn't signed our Contributor License Agreement, yet.
You can read and sign our full Contributor License Agreement at the following URL: https://cla.parity.io Once you've signed, please reply to this thread with Many thanks, Parity Technologies CLA Bot |
/// Same as [`invoke`]. | ||
/// | ||
/// [`invoke`]: #method.invoke | ||
pub fn invoke_with_stack<E: Externals>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered folding this into invoke
by adding stack_recycler: Option<&mut StackRecycler>
as a parameter, but did not do so to avoid breaking the API.
let limit = this | ||
.as_ref() | ||
.map_or(DEFAULT_VALUE_STACK_LIMIT, |this| this.value_stack_limit) | ||
/ ::core::mem::size_of::<RuntimeValueInternal>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This used to be size_of::<RuntimeValue>
but I suspect this was a typo as RuntimeValue
does include the tag in addition to the 64 bits of payload, but the slice is made of RuntimeValueInternal
.
As an additional data point these are results of the included benchmarks on my machine before
and after these changes
|
Ah, it seems changing the default size of the value stack from Should I adjust the default size back to 512 kB or leave it as is? |
[clabot:check] |
It looks like @adam-rhebo signed our Contributor License Agreement. 👍 Many thanks, Parity Technologies CLA Bot |
Hello, @adam-rhebo ! First, thank you for the PR. The code looks good, however my local benchmark report massive slowdowns on the latest commit:
Looking at these results it is clear that the less the running time is the more is the slowdown. Can we fix it?
Hm, it was 1MB before, isn't it? My logic is as follows:
|
I think the slowdown is due to doubling the effective default stack size, at least that is what my local measurements suggest.
The stack is made up of values of type
|
I think this is due to the stack initialization dominating the execution time for these small examples and hence doubling the stack size almost doubles the execution time. |
Ah, yeah, you are right! I am ok with leaving the stack size as is now. |
We hope to use
wasmi
to implement a policy engine for low-volume packet inspection in an embedded setting. Initial CPU profiles using toy policies showed almost 40% CPU time spent inVec::extend_with
to allocate and initialize a new value stack for each function call.This MR tries to address this by adding API to allow applications to reuse stack allocations (for both value and call stacks).
As a side effect, it also provides API for applications to configure the stack size limits (which is rather useful for us as we only need a few kB of value stack in our particular application).
A micro benchmark executing a minimal policy for single packet optimized for size showed the following changes:
At least the following questions remain open from my point of view: