Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uint8Array to UTF8 conversion #3

Closed
anonrig opened this issue Nov 10, 2022 · 8 comments · Fixed by nodejs/node#45412
Closed

Uint8Array to UTF8 conversion #3

anonrig opened this issue Nov 10, 2022 · 8 comments · Fixed by nodejs/node#45412

Comments

@anonrig
Copy link
Member

anonrig commented Nov 10, 2022

I see that undici is mostly using Buffer.from(name).toString('utf8'). This crosses the JS-C++ boundary 2 times. 1 for initializing, and 1 for toString.

I recommend implementing a function like this:

  • Buffer.asString(name, encoding) which returns string
  • Buffer.asStrings([first, second], encoding) which returns string[]
@mcollina
Copy link
Member

Sure thing, let's do it!

@anonrig
Copy link
Member Author

anonrig commented Nov 14, 2022

Here's the current state of TextDecoder in main branch.

➜  node-test git:(main) ✗ npm run text-decoder

> node-benchmarks@1.0.0 text-decoder
> ~/Developer/node/out/Release/node ./text-decoder/index.mjs && bun run ./text-decoder/index.mjs && deno run -A ./text-decoder/deno.js

cpu: Apple M1 Max
runtime: node v20.0.0-pre (arm64-darwin)

benchmark       time (avg)             (min … max)       p75       p99      p995
-------------------------------------------------- -----------------------------
smallUint8  308.17 ns/iter  (298.6 ns … 336.38 ns) 308.96 ns 335.55 ns 336.38 ns
bigUint8       2.6 µs/iter      (2.57 µs … 2.7 µs)   2.61 µs    2.7 µs    2.7 µs
cpu: Apple M1 Max
runtime: bun 0.2.2 (arm64-darwin)

benchmark       time (avg)             (min … max)       p75       p99      p995
-------------------------------------------------- -----------------------------
smallUint8  212.16 ns/iter (197.05 ns … 441.18 ns) 212.43 ns  224.1 ns  238.8 ns
bigUint8      1.14 µs/iter     (1.05 µs … 1.27 µs)   1.18 µs   1.27 µs   1.27 µs
cpu: Apple M1 Max
runtime: deno 1.27.2 (aarch64-apple-darwin)

benchmark       time (avg)             (min … max)       p75       p99      p995
-------------------------------------------------- -----------------------------
smallUint8  466.08 ns/iter (444.89 ns … 481.87 ns) 469.58 ns 481.26 ns 481.87 ns
bigUint8      3.76 µs/iter     (3.72 µs … 3.82 µs)   3.77 µs   3.82 µs   3.82 µs

@ronag
Copy link
Member

ronag commented Nov 14, 2022

Do you have a benchmark comparing it to Buffer.toString?

@anonrig
Copy link
Member Author

anonrig commented Nov 14, 2022

@ronag
Copy link
Member

ronag commented Nov 14, 2022

Do you have a link to the benchmark code?

@ronag
Copy link
Member

ronag commented Nov 14, 2022

Also do you compare buf.toString() vs TextEncoder.encode(buf), i.e. excluding the Buffer.from overhead?

@anonrig
Copy link
Member Author

anonrig commented Nov 14, 2022

Also do you compare buf.toString() vs TextEncoder.encode(buf), i.e. excluding the Buffer.from overhead?

Here's the full benchmark: https://github.com/anonrig/node-benchmarks/blob/main/text-encoder/index.mjs

@anonrig
Copy link
Member Author

anonrig commented Nov 17, 2022

I'm closing now, since the text decoder is much more performant right now for utf-8.

@anonrig anonrig closed this as completed Nov 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants