Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add bufferlike variants #50

Closed
wants to merge 1 commit into from
Closed

Conversation

ronag
Copy link

@ronag ronag commented Aug 28, 2024

Buffer.copy is faster than Uint8Array.set in Node 22.7+

@jungomi
Copy link
Owner

jungomi commented Aug 29, 2024

Everything in this library needs to be compatible with browsers, so any Node specific API can unfortunately not be used.

Besides that, any additional API (adding two methods here) needs to be documented, tested and showing benchmarks to support the claim of being faster. I haven't benchmarked it, but it would surprise me a fair bit that the overall effect is anything but marginal, as the copy happens once and isn't in the performance critical part (may have a noticeable effect on very large inputs).

I'd welcome any performance improvement, however small, as long as they are compatible with all targeted engines.

@jungomi jungomi closed this Aug 29, 2024
@ronag
Copy link
Author

ronag commented Aug 29, 2024

The difference is somewhat significant for small strings (e.g. hashing names):

xxhash-wasm#Raw x 23,375,136 ops/sec ±2.76% (91 runs sampled)
xxhash-wasm#Buffer x 29,135,964 ops/sec ±2.09% (93 runs sampled)
Benchmark 1 bytes - Fastest is xxhash-wasm#Buffer
xxhash-wasm#Raw x 23,693,053 ops/sec ±3.54% (88 runs sampled)
xxhash-wasm#Buffer x 27,326,526 ops/sec ±2.69% (90 runs sampled)
Benchmark 10 bytes - Fastest is xxhash-wasm#Buffer
xxhash-wasm#Raw x 18,401,779 ops/sec ±4.32% (88 runs sampled)
xxhash-wasm#Buffer x 21,288,005 ops/sec ±2.23% (93 runs sampled)
Benchmark 100 bytes - Fastest is xxhash-wasm#Buffer

https://gist.github.com/ronag/370da9efaf00358805ee6aa020671e3c

@jungomi
Copy link
Owner

jungomi commented Aug 29, 2024

That is quite surprising, to decouple the copy from the hashing itself, I ran a bench with just the copy.

async function runBench() {
  const buf = new Uint8Array(1e8);

  for (let i = 1; i <= 1e8; i *= 10) {
    const suite = new Benchmark.Suite(`${i} bytes`, handlers);
    const input = Buffer.from(randomString(i));

    suite
      .add("Buffer.copy()", () => {
        input.copy(buf);
      })
      .add("Array.set()", () => {
        buf.set(input);
      })
      .run();
  }
}

The small inputs are definitely much faster, about 10%, but that is pretty much gone after 1000 bytes.

Buffer.copy() x 71,016,491 ops/sec ±1.68% (89 runs sampled)
Array.set() x 64,727,575 ops/sec ±1.55% (92 runs sampled)
Benchmark 1 bytes - Fastest is Buffer.copy()
[...]
Buffer.copy() x 10,404,248 ops/sec ±1.50% (88 runs sampled)
Array.set() x 10,476,737 ops/sec ±0.89% (93 runs sampled)
Benchmark 10000 bytes - Fastest is Array.set()
Buffer.copy() x 586,896 ops/sec ±1.65% (91 runs sampled)
Array.set() x 596,555 ops/sec ±1.72% (90 runs sampled)
Benchmark 100000 bytes - Fastest is Array.set(),Buffer.copy()

But as you mentioned, that was on Node 22.7.

For comparison, Node 20.17 is much slower with Buffer.copy()

Buffer.copy() x 47,690,175 ops/sec ±2.64% (87 runs sampled)
Array.set() x 63,061,157 ops/sec ±1.23% (96 runs sampled)
Benchmark 1 bytes - Fastest is Array.set()

As already mentioned, it is not compatible with browsers, furthermore it gets even more confusing when you'd need to consider different versions of Node.

@marcusdarmstrong
Copy link
Contributor

Do we know what the specific node version that introduced the improvement happens to be?

@ronag
Copy link
Author

ronag commented Aug 29, 2024

Do we know what the specific node version that introduced the improvement happens to be?

Node 22.7.

nodejs/node#54087

@ronag
Copy link
Author

ronag commented Aug 29, 2024

This is what we are actually doing. We are not passing Buffer but something we call BufferLike.

https://gist.github.com/ronag/4f40ab7c3b18cb9a5320e47e67a42dd4

Then the difference is even more significant:

xxhash-wasm#Raw x 13,784,935 ops/sec ±7.28% (84 runs sampled)
xxhash-wasm#BufferLike x 22,450,161 ops/sec ±15.04% (75 runs sampled)
Benchmark 2 bytes - Fastest is xxhash-wasm#Buffer
xxhash-wasm#Raw x 13,780,269 ops/sec ±7.03% (85 runs sampled)
xxhash-wasm#BufferLike x 23,538,399 ops/sec ±5.19% (87 runs sampled)
Benchmark 4 bytes - Fastest is xxhash-wasm#Buffer
xxhash-wasm#Raw x 14,187,597 ops/sec ±6.48% (86 runs sampled)
xxhash-wasm#BufferLike x 24,406,704 ops/sec ±5.60% (87 runs sampled)
Benchmark 8 bytes - Fastest is xxhash-wasm#Buffer
xxhash-wasm#Raw x 14,724,406 ops/sec ±5.53% (89 runs sampled)
xxhash-wasm#BufferLike x 23,308,986 ops/sec ±5.69% (83 runs sampled)
Benchmark 16 bytes - Fastest is xxhash-wasm#Buffer
xxhash-wasm#Raw x 14,152,591 ops/sec ±5.87% (84 runs sampled)
xxhash-wasm#BufferLike x 23,954,550 ops/sec ±4.04% (88 runs sampled)
Benchmark 32 bytes - Fastest is xxhash-wasm#Buffer
xxhash-wasm#Raw x 14,231,788 ops/sec ±7.79% (82 runs sampled)
xxhash-wasm#BufferLike x 23,022,174 ops/sec ±3.79% (86 runs sampled)
Benchmark 64 bytes - Fastest is xxhash-wasm#Buffer
xxhash-wasm#Raw x 15,020,431 ops/sec ±3.75% (88 runs sampled)
xxhash-wasm#BufferLike x 21,324,857 ops/sec ±3.35% (87 runs sampled)
Benchmark 128 bytes - Fastest is xxhash-wasm#Buffer
xxhash-wasm#Raw x 15,196,173 ops/sec ±3.90% (90 runs sampled)
xxhash-wasm#BufferLike x 18,559,562 ops/sec ±2.57% (91 runs sampled)
Benchmark 256 bytes - Fastest is xxhash-wasm#Buffer

@jungomi
Copy link
Owner

jungomi commented Aug 29, 2024

@ronag seeing that you contributed the performance improvements of Buffer.copy() to Node, is there any reason why the fast path is not also applied to TypedArray.set()? Granted that the fast path can only be taken if the arguments are supported/overlap and possibly only if the source (array from which to copy the values) is of a specific type.

I'm just wondering because having an overload of TypedArray.set(buf: Buffer) that activates that path is essentially free in C++ (well I don't know the level of abstraction and all the dispatching that might occur). That's just for the specific case of having a library such as this one, that uses browser compatible APIs, being combined with a Node specific type.
Ideally, the fast path would be available across the TypedArray in general, as it seems to be such an improvement for small inputs.

@ronag
Copy link
Author

ronag commented Aug 29, 2024

TypedArray.set is implemented by V8. I don't know its exact internals and I'm also a little confounded how we were able to beat it. But I don't think Node should have its own implementation of it. It would probably be non-compliant in some edge case scenario.

The biggest problem with the browser API is that it's super slow to create sub arrays. Which was the original problem which that PR was trying to resolve. The fact that it was also faster with non-partial copies was an unexpected benefit.

i.e. this Is not possible to optimize and is super slow:

dst.set(src.subarray(start, end))

Hence this is much faster:

src.copy(dst, 0, start, end)

@ronag
Copy link
Author

ronag commented Aug 29, 2024

Looks like V8 does a lot of more check related to whether or not the buffer is attached and shared array backings store which node buffers don't support. https://github.com/v8/v8/blob/main/src/builtins/typed-array-set.tq

@jungomi
Copy link
Owner

jungomi commented Aug 29, 2024

Okay interesting. I figured it was V8 but I thought Node had some bridge between them. Anyway, if V8 could have these improvements, everyone would win, but I suspect they have some extra checks for the validity of the memory or something.

The .subarray() case is particularly intriguing, because we have to read/write the state of the WASM memory for the streaming API on every call, and there should be a subarray call (although it's still a unnecessary .slice() as the PR #49 addressing that was never cleaned up to be merged) aside from all the .set() calls in

xxhash-wasm/src/index.js

Lines 69 to 99 in 1288bfe

// We'll hold our hashing state in this closure.
const state = new Uint8Array(size);
memory.set(state);
init(0, seed);
// Each time we interact with wasm, it may have mutated our state so we'll
// need to read it back into our closed copy.
state.set(memory.slice(0, size));
return {
update(input) {
memory.set(state);
let length;
if (typeof input === "string") {
growMemory(input.length * 3, size);
length = encoder.encodeInto(input, memory.subarray(size)).written;
} else {
// The only other valid input type is a Uint8Array
growMemory(input.byteLength, size);
memory.set(input, size);
length = input.byteLength;
}
update(0, size, length);
state.set(memory.slice(0, size));
return this;
},
digest() {
memory.set(state);
return finalize(digest(0));
},
};

So there would be some potential there for a Node specific version, but since it's literally all that happens on the JS side, that would essentially mean a completely different module for Node vs. Browser.

Since I'm not planning to actively develop this further, and having different modules seems like a nightmare in general, any changes should remain engine agnostic.

@ronag
Copy link
Author

ronag commented Aug 29, 2024

Sure no problem. We have our own fork for our needs. Just thought we'd try to contribute back.

@jungomi
Copy link
Owner

jungomi commented Aug 29, 2024

Thanks and I also appreciate the insights into the Node specific parts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants