feat: add bufferlike variants #50

ronag · 2024-08-28T05:29:29Z

Buffer.copy is faster than Uint8Array.set in Node 22.7+

jungomi · 2024-08-29T10:49:18Z

Everything in this library needs to be compatible with browsers, so any Node specific API can unfortunately not be used.

Besides that, any additional API (adding two methods here) needs to be documented, tested and showing benchmarks to support the claim of being faster. I haven't benchmarked it, but it would surprise me a fair bit that the overall effect is anything but marginal, as the copy happens once and isn't in the performance critical part (may have a noticeable effect on very large inputs).

I'd welcome any performance improvement, however small, as long as they are compatible with all targeted engines.

ronag · 2024-08-29T11:25:21Z

The difference is somewhat significant for small strings (e.g. hashing names):

xxhash-wasm#Raw x 23,375,136 ops/sec ±2.76% (91 runs sampled)
xxhash-wasm#Buffer x 29,135,964 ops/sec ±2.09% (93 runs sampled)
Benchmark 1 bytes - Fastest is xxhash-wasm#Buffer
xxhash-wasm#Raw x 23,693,053 ops/sec ±3.54% (88 runs sampled)
xxhash-wasm#Buffer x 27,326,526 ops/sec ±2.69% (90 runs sampled)
Benchmark 10 bytes - Fastest is xxhash-wasm#Buffer
xxhash-wasm#Raw x 18,401,779 ops/sec ±4.32% (88 runs sampled)
xxhash-wasm#Buffer x 21,288,005 ops/sec ±2.23% (93 runs sampled)
Benchmark 100 bytes - Fastest is xxhash-wasm#Buffer

https://gist.github.com/ronag/370da9efaf00358805ee6aa020671e3c

jungomi · 2024-08-29T12:02:11Z

That is quite surprising, to decouple the copy from the hashing itself, I ran a bench with just the copy.

async function runBench() {
  const buf = new Uint8Array(1e8);

  for (let i = 1; i <= 1e8; i *= 10) {
    const suite = new Benchmark.Suite(`${i} bytes`, handlers);
    const input = Buffer.from(randomString(i));

    suite
      .add("Buffer.copy()", () => {
        input.copy(buf);
      })
      .add("Array.set()", () => {
        buf.set(input);
      })
      .run();
  }
}

The small inputs are definitely much faster, about 10%, but that is pretty much gone after 1000 bytes.

Buffer.copy() x 71,016,491 ops/sec ±1.68% (89 runs sampled)
Array.set() x 64,727,575 ops/sec ±1.55% (92 runs sampled)
Benchmark 1 bytes - Fastest is Buffer.copy()
[...]
Buffer.copy() x 10,404,248 ops/sec ±1.50% (88 runs sampled)
Array.set() x 10,476,737 ops/sec ±0.89% (93 runs sampled)
Benchmark 10000 bytes - Fastest is Array.set()
Buffer.copy() x 586,896 ops/sec ±1.65% (91 runs sampled)
Array.set() x 596,555 ops/sec ±1.72% (90 runs sampled)
Benchmark 100000 bytes - Fastest is Array.set(),Buffer.copy()

But as you mentioned, that was on Node 22.7.

For comparison, Node 20.17 is much slower with Buffer.copy()

Buffer.copy() x 47,690,175 ops/sec ±2.64% (87 runs sampled)
Array.set() x 63,061,157 ops/sec ±1.23% (96 runs sampled)
Benchmark 1 bytes - Fastest is Array.set()

As already mentioned, it is not compatible with browsers, furthermore it gets even more confusing when you'd need to consider different versions of Node.

marcusdarmstrong · 2024-08-29T12:07:06Z

Do we know what the specific node version that introduced the improvement happens to be?

ronag · 2024-08-29T12:17:36Z

Do we know what the specific node version that introduced the improvement happens to be?

Node 22.7.

nodejs/node#54087

ronag · 2024-08-29T12:18:21Z

This is what we are actually doing. We are not passing Buffer but something we call BufferLike.

https://gist.github.com/ronag/4f40ab7c3b18cb9a5320e47e67a42dd4

Then the difference is even more significant:

xxhash-wasm#Raw x 13,784,935 ops/sec ±7.28% (84 runs sampled)
xxhash-wasm#BufferLike x 22,450,161 ops/sec ±15.04% (75 runs sampled)
Benchmark 2 bytes - Fastest is xxhash-wasm#Buffer
xxhash-wasm#Raw x 13,780,269 ops/sec ±7.03% (85 runs sampled)
xxhash-wasm#BufferLike x 23,538,399 ops/sec ±5.19% (87 runs sampled)
Benchmark 4 bytes - Fastest is xxhash-wasm#Buffer
xxhash-wasm#Raw x 14,187,597 ops/sec ±6.48% (86 runs sampled)
xxhash-wasm#BufferLike x 24,406,704 ops/sec ±5.60% (87 runs sampled)
Benchmark 8 bytes - Fastest is xxhash-wasm#Buffer
xxhash-wasm#Raw x 14,724,406 ops/sec ±5.53% (89 runs sampled)
xxhash-wasm#BufferLike x 23,308,986 ops/sec ±5.69% (83 runs sampled)
Benchmark 16 bytes - Fastest is xxhash-wasm#Buffer
xxhash-wasm#Raw x 14,152,591 ops/sec ±5.87% (84 runs sampled)
xxhash-wasm#BufferLike x 23,954,550 ops/sec ±4.04% (88 runs sampled)
Benchmark 32 bytes - Fastest is xxhash-wasm#Buffer
xxhash-wasm#Raw x 14,231,788 ops/sec ±7.79% (82 runs sampled)
xxhash-wasm#BufferLike x 23,022,174 ops/sec ±3.79% (86 runs sampled)
Benchmark 64 bytes - Fastest is xxhash-wasm#Buffer
xxhash-wasm#Raw x 15,020,431 ops/sec ±3.75% (88 runs sampled)
xxhash-wasm#BufferLike x 21,324,857 ops/sec ±3.35% (87 runs sampled)
Benchmark 128 bytes - Fastest is xxhash-wasm#Buffer
xxhash-wasm#Raw x 15,196,173 ops/sec ±3.90% (90 runs sampled)
xxhash-wasm#BufferLike x 18,559,562 ops/sec ±2.57% (91 runs sampled)
Benchmark 256 bytes - Fastest is xxhash-wasm#Buffer

jungomi · 2024-08-29T13:19:01Z

@ronag seeing that you contributed the performance improvements of Buffer.copy() to Node, is there any reason why the fast path is not also applied to TypedArray.set()? Granted that the fast path can only be taken if the arguments are supported/overlap and possibly only if the source (array from which to copy the values) is of a specific type.

I'm just wondering because having an overload of TypedArray.set(buf: Buffer) that activates that path is essentially free in C++ (well I don't know the level of abstraction and all the dispatching that might occur). That's just for the specific case of having a library such as this one, that uses browser compatible APIs, being combined with a Node specific type.
Ideally, the fast path would be available across the TypedArray in general, as it seems to be such an improvement for small inputs.

ronag · 2024-08-29T13:28:03Z

TypedArray.set is implemented by V8. I don't know its exact internals and I'm also a little confounded how we were able to beat it. But I don't think Node should have its own implementation of it. It would probably be non-compliant in some edge case scenario.

The biggest problem with the browser API is that it's super slow to create sub arrays. Which was the original problem which that PR was trying to resolve. The fact that it was also faster with non-partial copies was an unexpected benefit.

i.e. this Is not possible to optimize and is super slow:

dst.set(src.subarray(start, end))

Hence this is much faster:

src.copy(dst, 0, start, end)

ronag · 2024-08-29T13:55:33Z

Looks like V8 does a lot of more check related to whether or not the buffer is attached and shared array backings store which node buffers don't support. https://github.com/v8/v8/blob/main/src/builtins/typed-array-set.tq

jungomi · 2024-08-29T13:57:40Z

Okay interesting. I figured it was V8 but I thought Node had some bridge between them. Anyway, if V8 could have these improvements, everyone would win, but I suspect they have some extra checks for the validity of the memory or something.

The .subarray() case is particularly intriguing, because we have to read/write the state of the WASM memory for the streaming API on every call, and there should be a subarray call (although it's still a unnecessary .slice() as the PR #49 addressing that was never cleaned up to be merged) aside from all the .set() calls in

xxhash-wasm/src/index.js

Lines 69 to 99 in 1288bfe

    
           // We'll hold our hashing state in this closure. 
        
           const state = new Uint8Array(size); 
        
           memory.set(state); 
        
           init(0, seed); 
        
           // Each time we interact with wasm, it may have mutated our state so we'll 
        
           // need to read it back into our closed copy. 
        
           state.set(memory.slice(0, size)); 
        
           return { 
        
             update(input) { 
        
               memory.set(state); 
        
               let length; 
        
               if (typeof input === "string") { 
        
                 growMemory(input.length * 3, size); 
        
                 length = encoder.encodeInto(input, memory.subarray(size)).written; 
        
               } else { 
        
                 // The only other valid input type is a Uint8Array 
        
                 growMemory(input.byteLength, size); 
        
                 memory.set(input, size); 
        
                 length = input.byteLength; 
        
               } 
        
               update(0, size, length); 
        
               state.set(memory.slice(0, size)); 
        
               return this; 
        
             }, 
        
             digest() { 
        
               memory.set(state); 
        
               return finalize(digest(0)); 
        
             }, 
        
           };

So there would be some potential there for a Node specific version, but since it's literally all that happens on the JS side, that would essentially mean a completely different module for Node vs. Browser.

Since I'm not planning to actively develop this further, and having different modules seems like a nightmare in general, any changes should remain engine agnostic.

ronag · 2024-08-29T14:00:57Z

Sure no problem. We have our own fork for our needs. Just thought we'd try to contribute back.

jungomi · 2024-08-29T14:02:42Z

Thanks and I also appreciate the insights into the Node specific parts.

feat: add bufferlike variants

4ff6509

jungomi closed this Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add bufferlike variants #50

feat: add bufferlike variants #50

ronag commented Aug 28, 2024

jungomi commented Aug 29, 2024 •

edited

Loading

ronag commented Aug 29, 2024 •

edited

Loading

jungomi commented Aug 29, 2024

marcusdarmstrong commented Aug 29, 2024

ronag commented Aug 29, 2024

ronag commented Aug 29, 2024 •

edited

Loading

jungomi commented Aug 29, 2024

ronag commented Aug 29, 2024 •

edited

Loading

ronag commented Aug 29, 2024

jungomi commented Aug 29, 2024 •

edited

Loading

ronag commented Aug 29, 2024

jungomi commented Aug 29, 2024

feat: add bufferlike variants #50

feat: add bufferlike variants #50

Conversation

ronag commented Aug 28, 2024

jungomi commented Aug 29, 2024 • edited Loading

ronag commented Aug 29, 2024 • edited Loading

jungomi commented Aug 29, 2024

marcusdarmstrong commented Aug 29, 2024

ronag commented Aug 29, 2024

ronag commented Aug 29, 2024 • edited Loading

jungomi commented Aug 29, 2024

ronag commented Aug 29, 2024 • edited Loading

ronag commented Aug 29, 2024

jungomi commented Aug 29, 2024 • edited Loading

ronag commented Aug 29, 2024

jungomi commented Aug 29, 2024

jungomi commented Aug 29, 2024 •

edited

Loading

ronag commented Aug 29, 2024 •

edited

Loading

ronag commented Aug 29, 2024 •

edited

Loading

ronag commented Aug 29, 2024 •

edited

Loading

jungomi commented Aug 29, 2024 •

edited

Loading