-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve signature lookup happening during instantiation #2818
Conversation
This commit is intended to be a perf improvement for instantiation of modules with lots of functions. Previously the `lookup_shared_signature` callback was showing up quite high in profiles as part of instantiation. As some background, this callback is used to translate from a module's `SignatureIndex` to a `VMSharedSignatureIndex` which the instance stores. This callback is called for two reasons, one is to translate all of the module's own types into `VMSharedSignatureIndex` for the purposes of `call_indirect` (the translation of that loads from this table to compare indices). The second reason is that a `VMCallerCheckedAnyfunc` is prepared for all functions and this embeds a `VMSharedSignatureIndex` inside of it. The slow part today is that the lookup callback was called once-per-function and each lookup involved hashing a full `WasmFuncType`. Albeit our hash algorithm is still Rust's default SipHash algorithm which is quite slow, but we also shouldn't need to re-hash each signature if we see it multiple times anyway. The fix applied in this commit is to change this lookup callback to an `enum` where one variant is that there's a table to lookup from. This table is a `PrimaryMap` which means that lookup is quite fast. The only thing we need to do is to prepare the table ahead of time. Currently this happens on the instantiation path because in my measurments the creation of the table is quite fast compared to the rest of instantiation. If this becomes an issue, though, we can look into creating the table as part of `SigRegistry::register_module` and caching it somewhere (I'm not entirely sure where but I'm sure we can figure it out). There's in generally not a ton of efficiency around the `SigRegistry` type. I'm hoping though that this fixes the next-lowest-hanging-fruit in terms of performance without complicating the implementation too much. I tried a few variants and this change seemed like the best balance between simplicity and still a nice performance gain. Locally I measured an improvement in instantiation time for a large-ish module by reducing the time from ~3ms to ~2.6ms per instance.
2b776f7
to
7c7ab75
Compare
// Check if we've already registered this module. | ||
if let Some(existing_module) = inner.ranges.get(&max) { | ||
assert_eq!(existing_module.range, module_stack_maps.range); | ||
debug_assert_eq!( | ||
existing_module.pc_to_stack_map, | ||
module_stack_maps.pc_to_stack_map, | ||
); | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this showing up in profiles? It seems unrelated to the rest of the changes in this PR and the early return seems like it is necessary to avoid tripping the assert!(old.is_none())
assertion at the end of this function.
if self.contains_pc(start) { | ||
return; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly confused with what's going on over here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was ensuring we don't register the same module more than once in the store, however, this could probably just be if self.module(start).is_some()
so it doesn't bother looking up the pc into the already registered module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, I just caught up to the changes below. Yeah, this shouldn't be necessary any more.
/// Builds a lookup table for a module from the possible module's signature | ||
/// indices to the shared signature index within this registry. | ||
pub fn lookup_table( | ||
&self, | ||
module: &Module, | ||
) -> PrimaryMap<SignatureIndex, VMSharedSignatureIndex> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than returning the PrimaryMap
and giving up ownership, could we keep the map on self
, clear it and rebuild it on each call, and reutrn a &PrimaryMap
? This way we could reuse the underlying allocation.
I suppose -- at that point, though -- we might as well move this out and cache it ✨ somewhere ✨ like you mention in the OP. I guess finding that somewhere is a bit hard because SignatureRegistry
s are Store
-specific, so we can't hang this map off the Module
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm good point! Given what we were discussing on chat though where this may soon move into a Engine
I think I might leave this as-is for now and we can revisit this later if this gets costly in the future (so far this isn't showing up when benchmarking instantiation for me).
.borrow_mut() | ||
.insert(ArcModuleCode(module.compiled_module().code().clone())); | ||
if !first { | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay this is where all that stuff about whether we are registering for the first time or not is handled. (FWIW, this would have been a little easier on me, as a reviewer, if it were in a separate commit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops my bad, sorry about that! I forgot to call this out in the description as well...
Subscribe to Label Actioncc @peterhuene
This issue or pull request has been labeled: "wasmtime:api"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
This commit is intended to be a perf improvement for instantiation of
modules with lots of functions. Previously the
lookup_shared_signature
callback was showing up quite high in profiles as part of instantiation.
As some background, this callback is used to translate from a module's
SignatureIndex
to aVMSharedSignatureIndex
which the instancestores. This callback is called for two reasons, one is to translate all
of the module's own types into
VMSharedSignatureIndex
for the purposesof
call_indirect
(the translation of that loads from this table tocompare indices). The second reason is that a
VMCallerCheckedAnyfunc
is prepared for all functions and this embeds a
VMSharedSignatureIndex
inside of it.
The slow part today is that the lookup callback was called
once-per-function and each lookup involved hashing a full
WasmFuncType
. Albeit our hash algorithm is still Rust's defaultSipHash algorithm which is quite slow, but we also shouldn't need to
re-hash each signature if we see it multiple times anyway.
The fix applied in this commit is to change this lookup callback to an
enum
where one variant is that there's a table to lookup from. Thistable is a
PrimaryMap
which means that lookup is quite fast. The onlything we need to do is to prepare the table ahead of time. Currently
this happens on the instantiation path because in my measurments the
creation of the table is quite fast compared to the rest of
instantiation. If this becomes an issue, though, we can look into
creating the table as part of
SigRegistry::register_module
and cachingit somewhere (I'm not entirely sure where but I'm sure we can figure it
out).
There's in generally not a ton of efficiency around the
SigRegistry
type. I'm hoping though that this fixes the next-lowest-hanging-fruit in
terms of performance without complicating the implementation too much. I
tried a few variants and this change seemed like the best balance
between simplicity and still a nice performance gain.
Locally I measured an improvement in instantiation time for a large-ish
module by reducing the time from ~3ms to ~2.6ms per instance.