-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update memfd image construction to avoid excessively large images #3819
Update memfd image construction to avoid excessively large images #3819
Conversation
Previously memfd-based image construction had a hard limit of a 1GB memory image but this mean that tiny wasm modules could allocate up to 1GB of memory which became a bit excessive especially in terms of memory usage during fuzzing. To fix this the conversion to a static memory image has been updated to first do a conversion to paged memory initialization, which is sparse, followed by a second conversion to static memory initialization. The sparse construction for the paged step should make it such that the upper/lower bounds of the initialization image are easily computed, and then afterwards this limit can be checked against some heuristics to determine if we're willing to commit to building up a whole static image for that module. The heuristics have been tweaked from "must be less than 1GB" to one of two conditions must be true: * Either the total memory image size is at most twice the size of the original paged data itself. * Otherwise the memory image size must be smaller than a reasonable threshold, currently 1MB. We'll likely need to tweak this over time and it's still possible to cause a lot of extra memory consumption, but for now this should be enough to appease the fuzzers. Closes bytecodealliance#3815
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, thanks so much for taking care of this! Just a few nits below.
@alexcrichton I've been playing a bit with different modules to see if the above heuristics work, and I'm becoming increasingly worried. For example I have a wizened SpiderMonkey wasm with My fundamental worry is, still, the performance cliff: if the user gets unlucky, they fall into a much lower-performance mode. It feels like we're optimizing for the wrong thing here: an abstract notion of sparse memory usage and leanness, but at the cost of real-world scenarios. To give another example, if I write a program with a native toolchain and embed data in it, and that data happens to have lots of zeroes, I think we should do the same thing: fundamentally, we're compiling to a representation that is "close to the metal", and part of that closeness is that it has a one-to-one image of what will go into memory. Making that a behavior that one has to fit the right heuristics to get just feels wrong somehow; like JavaScript engines all over again. So, I'd like to argue that we revert this change, and go back to a static limit of some sort, as I had proposed above (EDIT: actually in the discussion in #3815). We could perhaps make it user-configurable (I'd happily agree that baked-in arbitrary limits are bad), but I want to make sure the cliff is harder to hit than "oops, did too much wizening and heap is a bit sparse" :-) Thoughts? |
Ah, and one more advantage of a static limit-based approach is that we can take advantage of the "image and leftovers" aspect of our initialization data structure: we can build a dense image, ready to mmap, for all memory up to the last data segment that is below our bound; and then only do eager initialization for data beyond that. This means that e.g. if we have relatively dense heap down in the 0..N MB range, and then a random byte up at 1GiB, we don't reject the whole thing and do eager init of N megabytes; instead we continue to do memfd as normal and then just do eager init of the one random high byte. I'll go ahead and create an issue for this rather than braindumping on a closed PR, sorry :-) |
…tecodealliance#3819) * Update memfd image construction to avoid excessively large images Previously memfd-based image construction had a hard limit of a 1GB memory image but this mean that tiny wasm modules could allocate up to 1GB of memory which became a bit excessive especially in terms of memory usage during fuzzing. To fix this the conversion to a static memory image has been updated to first do a conversion to paged memory initialization, which is sparse, followed by a second conversion to static memory initialization. The sparse construction for the paged step should make it such that the upper/lower bounds of the initialization image are easily computed, and then afterwards this limit can be checked against some heuristics to determine if we're willing to commit to building up a whole static image for that module. The heuristics have been tweaked from "must be less than 1GB" to one of two conditions must be true: * Either the total memory image size is at most twice the size of the original paged data itself. * Otherwise the memory image size must be smaller than a reasonable threshold, currently 1MB. We'll likely need to tweak this over time and it's still possible to cause a lot of extra memory consumption, but for now this should be enough to appease the fuzzers. Closes bytecodealliance#3815 * Review comments
Previously memfd-based image construction had a hard limit of a 1GB
memory image but this mean that tiny wasm modules could allocate up to
1GB of memory which became a bit excessive especially in terms of memory
usage during fuzzing. To fix this the conversion to a static memory
image has been updated to first do a conversion to paged memory
initialization, which is sparse, followed by a second conversion to
static memory initialization.
The sparse construction for the paged step should make it such that the
upper/lower bounds of the initialization image are easily computed, and
then afterwards this limit can be checked against some heuristics to
determine if we're willing to commit to building up a whole static image
for that module. The heuristics have been tweaked from "must be less
than 1GB" to one of two conditions must be true:
Either the total memory image size is at most twice the size of the
original paged data itself.
Otherwise the memory image size must be smaller than a reasonable
threshold, currently 1MB.
We'll likely need to tweak this over time and it's still possible to
cause a lot of extra memory consumption, but for now this should be
enough to appease the fuzzers.
Closes #3815