Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reason for using node-gyp? #103

Closed
webketje opened this issue Feb 15, 2022 · 7 comments
Closed

Reason for using node-gyp? #103

webketje opened this issue Feb 15, 2022 · 7 comments

Comments

@webketje
Copy link

Hi there,

Is there any reason why the module defaults to validation.c (compiled with node-gyp postinstall)
AFAICS the "fallback" contains the same "main" code and no important native API's are used.
Is it purely for efficiency with large files?

Context: looking for replacement for https://github.com/wayfind/is-utf8 for Metalsmith SSG, due to erroneous handling of ASCII chars, see wayfind/is-utf8#6 but cannot consider any lib that requires node-gyp (due to platform-specific issues)

So this request would be to justify node-gyp dependency overhead in the docs and/ or make it an optionalDependency (as require('utf8-validate/fallback') should also work

@lpinca
Copy link
Member

lpinca commented Feb 16, 2022

Because the native version is a lot faster.

@lpinca
Copy link
Member

lpinca commented Feb 16, 2022

FWIW, prebuilt binaries are included in the npm package for the most common platforms (Linux x64, Windows ia32 & x64, macOS x64 + arm64) so node-gyp is not used in most cases.

@webketje
Copy link
Author

webketje commented Feb 16, 2022

@lpinca oh ok, that wasn't obvious from looking through the README/ sources, I think this deserves a mention in the README

@lpinca
Copy link
Member

lpinca commented Feb 16, 2022

I think it is obvious if you look at the module entry point and package.json, but feel free to open a PR.

@webketje
Copy link
Author

Ok ehm so, I was intrigued by this and did some perf tests here: https://replit.com/@webketje/utf8check-perf

Below is the result of a single run (but the results wildly varied for the text file tests)

Benchmark results

8KB txt file

is-utf8 x 17,166,321 ops/sec ±4.35% (50 runs sampled)
utf-8-validate (native) x 2,003,899 ops/sec ±5.64% (52 runs sampled)
utf-8-validate (js) x 6,730,442 ops/sec ±136.25% (54 runs sampled)
ext check (skip utf8 sniffing) x 2,193,404 ops/sec ±25.52% (49 runs sampled)
Fastest is is-utf8

88KB txt file

is-utf8 x 1,566 ops/sec ±6.16% (40 runs sampled)
utf-8-validate (native) x 40,233 ops/sec ±7.27% (43 runs sampled)
utf-8-validate (js) x 4,034 ops/sec ±6.22% (46 runs sampled)
ext check (skip utf8 sniffing) x 5,033,477 ops/sec ±28.84% (46 runs sampled)
Fastest is ext check (skip utf8 sniffing)

5MB img

is-utf8 x 12,552,270 ops/sec ±18.36% (49 runs sampled)
utf-8-validate (native) x 2,000,575 ops/sec ±6.04% (45 runs sampled)
utf-8-validate (js) x 10,540,215 ops/sec ±5.43% (49 runs sampled)
ext check (skip utf8 sniffing) x 4,042,329 ops/sec ±14.62% (47 runs sampled)
Fastest is is-utf8

12MB zip

is-utf8 x 13,113,566 ops/sec ±28.05% (48 runs sampled)
utf-8-validate (native) x 2,003,686 ops/sec ±5.56% (50 runs sampled)
utf-8-validate (js) x 10,718,546 ops/sec ±5.89% (49 runs sampled)
ext check (skip utf8 sniffing) x 2,341,464 ops/sec ±19.54% (51 runs sampled)
Fastest is is-utf8

Interesting to note and what was consistent across multiple runs, is that when run on a relatively large binary file (eg .jpg, .zip), native utf-8-validate (precompiled) was up to 10x slower than "fallback", and both were outperformed by is-utf8

@lpinca
Copy link
Member

lpinca commented Feb 17, 2022

Interesting to note and what was consistent across multiple runs, is that when run on a relatively large binary file (eg .jpg, .zip), native utf-8-validate (precompiled) was up to 10x slower than "fallback", and both were outperformed by is-utf8.

In this case the first few bytes are probably not valid UTF-8, so the cost of calling C++ from JS does not pay off. This is confirmed by the fact that there is no difference between the 5MB and 12MB file.

See #90, websockets/ws#1348, https://github.com/websockets/ws/blob/8.5.0/lib/validation.js#L114, and #101.

@lpinca
Copy link
Member

lpinca commented Feb 25, 2022

I'm closing this as answered. Discussion can continue if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants