-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for TextEncoder#encodeInto
#1279
Conversation
@fitzgen I'm curious on your thoughts about the file size increase here. One possible option I can think of is that we could detect when |
I think the ideal place we would like to end up at is something like
how does that sound? In terms for this PR, as opposed to an ideal future end state, I think just choosing a default and providing a flag to explicitly set it on or off is good enough to land this. |
crates/cli-support/src/js/mod.rs
Outdated
self.global(&format!( | ||
" | ||
function passStringToWasm(arg) {{ | ||
{} | ||
if (typeof cachedTextEncoder.encodeInto === 'function') | ||
return passStringToWasmFast(arg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of checking every time we pass a string, we should do this:
(function () {
return typeof TextEncoder.prototype.encodeInto === "function"
? passStringToWasmFast
: passStringToWasm;
function passStringToWasmFast(arg) { ... }
function passStringToWasm(arg) { ... }
}())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use_encode
body is still checking for encodeInto
, which I think is an oversight
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops yes
Yeah I think ideally we'd have all the various knobs, but I'm also hoping that we can in the meantime pick reasonable defaults that avoid the need for knobs for as long as possible. Do you think it's worth doing any detection right now? (I'm sort of on the fence about this but would lean towards "no") If we need to do detection, though, do you think we need to probe much beyond the compile settings for the crate? |
Detection of whether |
Ok just to make sure I understand, you're ok merging this (with the change to only do the feature detect once) as-is? In the long-term we'd keep the option open though to have configuration for tweaking this output |
I think we should at least have a CLI flag like |
4e8371e
to
eadba8e
Compare
Updated! |
crates/cli-support/src/js/mod.rs
Outdated
self.global(&format!( | ||
" | ||
function passStringToWasm(arg) {{ | ||
{} | ||
if (typeof cachedTextEncoder.encodeInto === 'function') | ||
return passStringToWasmFast(arg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use_encode
body is still checking for encodeInto
, which I think is an oversight
This commit adds support for the recently implemented standard of [`TextEncoder#encodeInto`][standard]. This new function is a "bring your own buffer" style function where we can avoid an intermediate allocation and copy by encoding strings directly into wasm's memory. Currently we feature-detect whether `encodeInto` exists as it is only implemented in recent browsers and not in all browsers. Additionally this commit emits the binding using `encodeInto` by default, but this requires `realloc` functionality to be exposed by the wasm module. Measured locally an empty binary which takes `&str` previously took 7.6k, but after this commit takes 8.7k due to the extra code needed for `realloc`. [standard]: https://encoding.spec.whatwg.org/#dom-textencoder-encodeinto Closes rustwasm#1172
eadba8e
to
745b16e
Compare
break; | ||
}} | ||
ptr = wasm.__wbindgen_realloc(ptr, size, size * 2); | ||
size *= 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the allocator isn't a power-of-two allocator, this can overallocate. It is known that the max number of bytes that encodeInto
can require is JavaScript string length times 3. (It follows that if the Wasm program knows the string to be short-lived on the Wasm side, it makes sense to just allocate for the ma length case up front and not reallocate.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm good point! Do you think it'd be worthwhile do do something like malloc(string.length)
, and if that fails realloc(string.length * 3)
followed by realloc(the_actual_length)
and that'd be more appropriate than this loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the allocations can be assumed the be short-lived, I suggest going with malloc(string.length * 3)
without a loop.
If they are assumed to be long-lived, I suggest coming up with a magic number magic
that characterizes how much smaller the initial allocation needs to than the worst case for it to be worthwhile not to allocate for the worst case right away.
Then:
let min = roundUpToBucketSize(string.length);
let max = string.length * 3;
if (min + magic > max) {
// min is close enough to max anyway
let m = malloc(max);
let (read, written) = encodeInto(...);
assert(read == string.length);
} else {
let m = malloc(min);
let (read, written) = encodeInto(...);
if (read < string.length) {
let inputLeft = string.length - read;
let outputNeeded = inputLeft * 3;
let m = realloc(m, written + outputNeeded);
// ... encodeInto with input starting from read and output starting from written
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thanks for the info! I've opened up #1313 to further track this.
This commit adds support for the recently implemented standard of
TextEncoder#encodeInto
. This new function is a "bring yourown buffer" style function where we can avoid an intermediate allocation
and copy by encoding strings directly into wasm's memory.
Currently we feature-detect whether
encodeInto
exists as it is onlyimplemented in recent browsers and not in all browsers. Additionally
this commit emits the binding using
encodeInto
by default, but thisrequires
realloc
functionality to be exposed by the wasm module.Measured locally an empty binary which takes
&str
previously took7.6k, but after this commit takes 8.7k due to the extra code needed for
realloc
.Closes #1172