-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazily load types into Func
#3727
Lazily load types into Func
#3727
Conversation
This commit changes the construction of a `Func` to lazily load the type information if required instead of always loading the type information at `Func`-construction time. The main purpose of this change is to accelerate instantiation of instances which have many imports. Currently in the fast way of doing this the instantiation loop looks like: let mut store = Store::new(&engine, ...); let instance = instance_pre.instantiate(&mut store); In this situation the `instance_pre` will typically load host-defined functions (defined via `Linker` APIs) into the `Store` as individual `Func` items and then perform the instantiation process. The operation of loading a `HostFunc` into a `Store` however currently involves two expensive operations: * First a read-only lock is taken on the `RwLock` around engine signatures. * Next a clone of the wasm type is made to pull it out of the engine signature registry. Neither of these is actually necessary for imported functions. The `FuncType` for imported functions is never used since all comparisons happen with the intern'd indices instead. The only time a `FuncType` is used typically is for exported functions when using `Func::typed` or similar APIs which need type information. This commit makes this path faster by storing `Option<FuncType>` instead of `FuncType` within a `Func`. This means that it starts out as `None` and is only filled in on-demand as necessary. This means that when instantiating a module with many imports no clones/locks are done. On a simple microbenchmark where a module with 100 imports is instantiated this PR improves instantiation time by ~35%. Due to the rwlock used here and the general inefficiency of pthreads rwlocks the effect is even more profound when many threads are performing the same instantiation process. On x86_64 with 8 threads performing instantiation this PR improves instantiation time by 80% and on arm64 it improves by 97% (wow read-contended glibc rwlocks on arm64 are slow). Note that much of the improvement here is also from memory allocatoins/deallocations no longer being performed because dropping functions within a store no longer requires deallocating the `FuncType` if it's not present. A downside of this PR is that `Func::ty` is now unconditionally taking an rwlock if the type hasn't already been filled in. (it uses the engine). If this is an issue in the future though we can investigate at that time using somthing like a `Once` to lazily fill in even when mutable access to the store isn't available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, that's a fantastic speedup!
} | ||
|
||
/// Gets a reference to the `FuncType` for this function. | ||
fn ty_ref<'a>(&self, store: &'a mut StoreOpaque) -> (&'a FuncType, &'a StoreOpaque) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's probably a mundane borrow-checker-subtlety reason for returning the downgraded immutable-borrow of StoreOpaque
here, but could we add a comment describing why?
Subscribe to Label Actioncc @peterhuene
This issue or pull request has been labeled: "wasmtime:api"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
* Lazily load types into `Func` This commit changes the construction of a `Func` to lazily load the type information if required instead of always loading the type information at `Func`-construction time. The main purpose of this change is to accelerate instantiation of instances which have many imports. Currently in the fast way of doing this the instantiation loop looks like: let mut store = Store::new(&engine, ...); let instance = instance_pre.instantiate(&mut store); In this situation the `instance_pre` will typically load host-defined functions (defined via `Linker` APIs) into the `Store` as individual `Func` items and then perform the instantiation process. The operation of loading a `HostFunc` into a `Store` however currently involves two expensive operations: * First a read-only lock is taken on the `RwLock` around engine signatures. * Next a clone of the wasm type is made to pull it out of the engine signature registry. Neither of these is actually necessary for imported functions. The `FuncType` for imported functions is never used since all comparisons happen with the intern'd indices instead. The only time a `FuncType` is used typically is for exported functions when using `Func::typed` or similar APIs which need type information. This commit makes this path faster by storing `Option<FuncType>` instead of `FuncType` within a `Func`. This means that it starts out as `None` and is only filled in on-demand as necessary. This means that when instantiating a module with many imports no clones/locks are done. On a simple microbenchmark where a module with 100 imports is instantiated this PR improves instantiation time by ~35%. Due to the rwlock used here and the general inefficiency of pthreads rwlocks the effect is even more profound when many threads are performing the same instantiation process. On x86_64 with 8 threads performing instantiation this PR improves instantiation time by 80% and on arm64 it improves by 97% (wow read-contended glibc rwlocks on arm64 are slow). Note that much of the improvement here is also from memory allocatoins/deallocations no longer being performed because dropping functions within a store no longer requires deallocating the `FuncType` if it's not present. A downside of this PR is that `Func::ty` is now unconditionally taking an rwlock if the type hasn't already been filled in. (it uses the engine). If this is an issue in the future though we can investigate at that time using somthing like a `Once` to lazily fill in even when mutable access to the store isn't available. * Review comments
This commit changes the construction of a
Func
to lazily load the typeinformation if required instead of always loading the type information
at
Func
-construction time. The main purpose of this change is toaccelerate instantiation of instances which have many imports. Currently
in the fast way of doing this the instantiation loop looks like:
In this situation the
instance_pre
will typically load host-definedfunctions (defined via
Linker
APIs) into theStore
as individualFunc
items and then perform the instantiation process. The operationof loading a
HostFunc
into aStore
however currently involves twoexpensive operations:
RwLock
around enginesignatures.
signature registry.
Neither of these is actually necessary for imported functions. The
FuncType
for imported functions is never used since all comparisonshappen with the intern'd indices instead. The only time a
FuncType
isused typically is for exported functions when using
Func::typed
orsimilar APIs which need type information.
This commit makes this path faster by storing
Option<FuncType>
insteadof
FuncType
within aFunc
. This means that it starts out asNone
and is only filled in on-demand as necessary. This means that when
instantiating a module with many imports no clones/locks are done.
On a simple microbenchmark where a module with 100 imports is
instantiated this PR improves instantiation time by ~35%. Due to the
rwlock used here and the general inefficiency of pthreads rwlocks the
effect is even more profound when many threads are performing the same
instantiation process. On x86_64 with 8 threads performing instantiation
this PR improves instantiation time by 80% and on arm64 it improves by
97% (wow read-contended glibc rwlocks on arm64 are slow).
Note that much of the improvement here is also from memory
allocatoins/deallocations no longer being performed because dropping
functions within a store no longer requires deallocating the
FuncType
if it's not present.
A downside of this PR is that
Func::ty
is now unconditionally takingan rwlock if the type hasn't already been filled in. (it uses the
engine). If this is an issue in the future though we can investigate at
that time using somthing like a
Once
to lazily fill in even whenmutable access to the store isn't available.