Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up passing ASCII-only strings to WASM #1470

Merged
merged 4 commits into from
May 13, 2019

Conversation

RReverser
Copy link
Member

@RReverser RReverser commented Apr 17, 2019

Some speed up numbers from my string-heavy WASM benchmarks mentioned in the previous PRs:

  • Firefox + encodeInto: +45%
  • Chrome + encodeInto: +80%
  • Firefox + encode: +29%
  • Chrome + encode: +62%

Note that this helps specifically with case of lots of small ASCII strings, in case of large strings there is no measurable difference in either direction.

Related issue: #1313

r? @alexcrichton

@alexcrichton
Copy link
Contributor

Thanks for this! I was wondering though if we could perhaps be more strict about changes here to head off regressions like encodeInto slipped through previously? Would you be able to generate a benchmark that others could run as well? Is it possible to enhance the test suite in this regard and/or write node-facing benchmarks that we could test on CI? It'd be great to have more coverage of this since it does seem like the logic is getting especially tricky.

cc @fitzgen

@RReverser
Copy link
Member Author

Would you be able to generate a benchmark that others could run as well?

I think this would make sense, yeah, although for now I've been focusing on finishing serde-wasm-bindgen (which has own benchmarks presented above and relies on optimisations).

So yes, I think it's a great idea given the risks, but I'm not 100% sure when I'll get around to adding all these generic tests and benchmarks.

How about leaving this open for now and maybe someone else would want to add these string tests in the meanwhile?

@RReverser
Copy link
Member Author

Although wait, in terms of tests - I think you said we have coverage now that newer Firefox runs on Azure?

@alexcrichton
Copy link
Contributor

We should run tests in both Node.js and Firefox on Azure right now, and I think our test suite on strings is pretty light so it'd be nice to test just a wide variety of strings which mix non-ascii, lengths, etc. That'd be good for building confidence at least!

The numbers here sound great but I'd just want to make sure they were reproducible as well as ideally having helpful benchmarks to run in the future. We probably can't have a super nice framework of benchmarks just yet, but having at least the start I think would be a good way to begin.

@fitzgen
Copy link
Member

fitzgen commented Apr 19, 2019

Regarding benchmarks: I know you also had a micro benchmark for various into/from wasm ABI stuff, Alex. I think it would be useful to collect these in a single place where anyone can run them and we make sure that they build in our CI.

How about a ./benchmarks directory that is essentially the same as ./examples but for internal-facing [micro-]benchmarks rather than user-facing examples? (Note: I'm not suggesting ./benches since that could potentially confuse cargo). Each ./benchmarks/foo directory could have a little README describing what it is measuring and how to build+run the benchmark.

@alexcrichton
Copy link
Contributor

Ah that's right, I did! After some digging I also apparently even published the benchmark.

I actually think it'd be pretty cool if we published the benchmark for our own internal usage from CI, and that way we could always redirect folks to take a look and compare numbers themselves. I can try to whip something up and make a PR

@RReverser
Copy link
Member Author

@alexcrichton Did you have any time / luck with benchmarks so far?

@alexcrichton
Copy link
Contributor

Sorry no I haven't had enough time to work through this and try to get something set up. In the meantime could you share how you were benchmarking this? Additionally have you had a chance to write some more tests for ascii/unicode/etc?

@RReverser
Copy link
Member Author

RReverser commented Apr 30, 2019

Sorry no I haven't had enough time to work through this and try to get something set up. In the meantime could you share how you were benchmarking this?

Alright, I've created isolated benchmark set for passing strings into WASM here, instructions included in README: https://github.com/RReverser/wasm-bindgen-string-benches

I've ran them myself in latest Firefox and Chrome to provide some reference numbers.

As expected, because these are isolated string benchmarks and not whole-app improvements like above, the difference is even bigger, especially for small ASCII strings, which are particularly common and where C++ overhead is particularly noticeable.

string / ops/s Chrome (before) Firefox (before) Chrome (after) Firefox (after) Chrome (speed up) Firefox (speed up)
ja 616,791 1,740,501 6,940,567 10,975,125 11.3x 6.3x
aym0566x 641,746 1,715,014 4,207,093 9,756,924 6.6x 5.7x
505874924095815000 615,011 1,701,137 6,589,310 8,152,707 10.7x 4.8x
Sun Aug 31 00:29:15 +0000 2014 585,353 1,678,253 5,307,496 6,643,487 9.1x 4.0x
https://pbs.twimg.com/profile_images/497760886795153410/LDjAwR_y_normal.jpeg 523,348 1,492,255 2,940,477 3,937,804 5.6x 2.6x
<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> 533,109 1,516,124 2,788,777 3,762,589 5.2x 2.5x
@aym0566x 名前:前田あゆみ... 299,053 622,844 258,450 966,838 0.9x 1.6x

image

@RReverser
Copy link
Member Author

As for more tests - no, I didn't have time to work on that.

Copy link
Contributor

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the benchmark! I can indeed reproduce some similar performance gains on Firefox myself.

I think we'll want to have more tests for ascii/unicode/etc before this lands to ensure it doesn't accidentally regress anything and we don't accidentally regress it in the future.

crates/cli-support/src/js/mod.rs Outdated Show resolved Hide resolved
crates/cli-support/src/js/mod.rs Outdated Show resolved Hide resolved
ptr = wasm.__wbindgen_realloc(ptr, size, size += arg.length * 3);
const view = getUint8Memory().subarray(ptr + offset, ptr + size);
offset += cachedTextEncoder.encodeInto(arg, view).written;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could a debug assert of some form be included after this to ensure that it wrote everything?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't exactly use debug_assert in the JS code, but I guess I could add something if debug_assertions is on...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a --debug flag to wasm-bindgen itself which can control whether the assertion is emitted or not

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah great.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure what to check here. I mean, I'm passing all that's left of arg to encodeInto, so it has to be written fully - what condition would I check to prove it?

Some speed up numbers from my string-heavy WASM benchmarks:
 - Firefox + encodeInto: +45%
 - Chrome + encodeInto: +80%
 - Firefox + encode: +29%
 - Chrome + encode: +62%

Note that this helps specifically with case of lots of small ASCII strings, in case of large strings there is no measurable difference in either direction.
@RReverser
Copy link
Member Author

Thanks for the benchmark! I can indeed reproduce some similar performance gains on Firefox myself.

Hmm, I just found out that I left few mistakes in the benchmark code - did you manage to run the same benchmarks when you wrote that? Anyway, should be fixed now.

@alexcrichton alexcrichton merged commit f977630 into rustwasm:master May 13, 2019
@@ -1499,12 +1499,19 @@ impl<'a> Context<'a> {
arg = arg.slice(offset);
ptr = wasm.__wbindgen_realloc(ptr, size, size = offset + arg.length * 3);
const view = getUint8Memory().subarray(ptr + offset, ptr + size);
const ret = cachedTextEncoder.encodeInto(arg, view);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexcrichton I see you already merged it, but this change... doesn't seem right. Why is it encoding same view twice now?

@RReverser
Copy link
Member Author

Hmm, I'm not sure what to check here. I mean, I'm passing all that's left of arg to encodeInto, so it has to be written fully - what condition would I check to prove it?

I asked that question but didn't get any response. @alexcrichton can we revert this change (given the bug above) and land it with a proper fix and review instead please?

RReverser added a commit to RReverser/wasm-bindgen that referenced this pull request May 14, 2019
This was a regression introduced in the last commit of rustwasm#1470, which might make Unicode strings 2x slower to pass.
@alexcrichton
Copy link
Contributor

Yes that was a mistake in my merge. I ran the benchmarks locally and had them fixed so they ran correctly and showed improvements. Why do you want to revert this and back it out?

@RReverser
Copy link
Member Author

@alexcrichton Mainly due to 2x regression for Unicode strings, but in the meanwhile I've submitted a PR to fix just that, and I see you already merged it.

I don't want to cause problems, but I think that it would be helpful to ask one more pair of eyes to look at the new changes before merging to make sure there are no similar regressions in the future.

Also hoped to clean up the code a bit further - to avoid duplicate JS in the output - when I have time to get back to this, but I guess I can leave that for a separate PR.

Either way, thanks for merging both PRs and getting back to me!

@alexcrichton
Copy link
Contributor

Ok just wanted to make sure. It would also be helpful to write requested tests ahead of time for PRs!

This isn't a massive project where some breakage on master destroy's everyone's workflow. I'm simply extremely busy yesterday and today and it took a bit to merge a fix. No sweat if things are broken temporarily.

@RReverser
Copy link
Member Author

Fair enough, maybe I'm stressing over breakages more than I should.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants