Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove native implementation and always use JS #90

Closed
wants to merge 1 commit into from
Closed

Remove native implementation and always use JS #90

wants to merge 1 commit into from

Conversation

shinnn
Copy link

@shinnn shinnn commented Jun 8, 2019

The performance of V8 has been continuously improved in recent years.

Benchmark script

'use strict';

const file = process.argv[2];
const isUtf8 = require(file);
const fixtures = [
  Buffer.alloc(0), // zero-length buffer
  Buffer.from('utf8'), // UTF-8 buffer
  Buffer.from([0xff, 0xff, 0xff, 0xff]), // non UTF-8 buffer
  Buffer.from('utf8'.repeat(100)), // long UTF-8 buffer
  Buffer.from(Array.from({length: 400}, () => 0xff)) // long non UTF-8 buffer
];
let i = 9999999;

console.time(file);

while (i--) {
  for (const fixture of fixtures.values()) {
    isUtf8(fixture);
  }
}

console.timeEnd(file);

Results

Node.js v12.4.0 is used. Faster is better.

Native implementation
$ node benchmark ./fallback.js
./fallback.js: 3162.678ms
JavaScript fallback bypassing native implementation
$ node benchmark ./index.js
./index.js: 3175.741ms

Since there is no significant performance difference between the native implementation and the JS one, I removed native implementation from this repository.

There are two merits this patch brings to users and maintainers:

  • no building step when a user installs this npm package, which is much faster than ever
  • reduces a lot of code from this repository and make it more maintainable

V8 has been continuously improved and there is no significant performance difference between the native implementation and the JS one.
@shinnn
Copy link
Author

shinnn commented Jun 8, 2019

🔔 @lpinca (because you're currently not watching this repository)

@lpinca
Copy link
Member

lpinca commented Jun 8, 2019

Thank you @shinnn but it really depends on the input size.
I've run https://github.com/micnic/uv benchmarks.

diff --git a/bench.js b/bench.js
index 4f2f176..a0d2536 100644
--- a/bench.js
+++ b/bench.js
@@ -4,7 +4,7 @@ const Benchmark = require('benchmark');
 
 const https = require('https');
 
-const uv = require('uv');
+const uv = require('.');
 const isValidUTF8 = require('utf-8-validate');
 const isValidUTF8Fallback = require('utf-8-validate/fallback');
 const isUtf8 = require('isutf8');
@@ -264,4 +264,4 @@ loadPage('https://en.wikipedia.org/wiki/Main_Page').then(() => {
        })
        // run async
        .run({ 'async': true });
-});
\ No newline at end of file
+});
diff --git a/package.json b/package.json
index 069c4fa..4625e4f 100644
--- a/package.json
+++ b/package.json
@@ -29,9 +29,9 @@
                "eslint": "4.x",
                "isutf8": "2.x",
                "tap": "11.x",
-               "utf-8-validate": "4.x"
+               "utf-8-validate": "5.x"
        },
        "engines": {
                "node": ">=6.0"
        }
-}
\ No newline at end of file
+}
$ node bench.js 
Loading https://en.wikipedia.org/wiki/Main_Page ...
uv x 17,524 ops/sec ±0.50% (96 runs sampled)
utf-8-validate (default, C++) x 25,876 ops/sec ±0.16% (96 runs sampled)
utf-8-validate (fallback, JS) x 13,416 ops/sec ±0.11% (98 runs sampled)
isutf8 x 13,420 ops/sec ±0.11% (100 runs sampled)
------------------------------------------------------------

Loading https://ro.wikipedia.org/wiki/Pagina_principală ...
uv x 11,601 ops/sec ±0.09% (99 runs sampled)
utf-8-validate (default, C++) x 14,311 ops/sec ±0.41% (98 runs sampled)
utf-8-validate (fallback, JS) x 8,691 ops/sec ±0.15% (99 runs sampled)
isutf8 x 8,799 ops/sec ±0.10% (100 runs sampled)
------------------------------------------------------------

Loading https://ru.wikipedia.org/wiki/Заглавная_страница ...
uv x 6,610 ops/sec ±0.12% (98 runs sampled)
utf-8-validate (default, C++) x 8,691 ops/sec ±0.10% (97 runs sampled)
utf-8-validate (fallback, JS) x 5,311 ops/sec ±0.11% (98 runs sampled)
isutf8 x 5,383 ops/sec ±0.11% (99 runs sampled)
------------------------------------------------------------

Loading https://ar.wikipedia.org/wiki/الصفحة_الرئيسية ...
uv x 6,946 ops/sec ±0.12% (98 runs sampled)
utf-8-validate (default, C++) x 9,160 ops/sec ±0.08% (98 runs sampled)
utf-8-validate (fallback, JS) x 5,590 ops/sec ±0.11% (99 runs sampled)
isutf8 x 5,638 ops/sec ±0.19% (99 runs sampled)
------------------------------------------------------------

Preparing 16MB of random ASCII bytes (best case)
uv x 81.41 ops/sec ±0.58% (70 runs sampled)
utf-8-validate (default, C++) x 124 ops/sec ±0.14% (79 runs sampled)
utf-8-validate (fallback, JS) x 61.88 ops/sec ±0.15% (65 runs sampled)
isutf8 valid buffer x 61.97 ops/sec ±0.14% (65 runs sampled)
------------------------------------------------------------

Preparing all valid UTF-8 bytes ~4.17 MB (worst case)
uv x 189 ops/sec ±0.12% (87 runs sampled)
utf-8-validate (default, C++) x 363 ops/sec ±0.27% (92 runs sampled)
utf-8-validate (fallback, JS) x 249 ops/sec ±0.12% (90 runs sampled)
isutf8 valid buffer x 191 ops/sec ±0.12% (88 runs sampled)

The native implementation is still ~2x faster.

no building step when a user installs this npm package, which is much faster than ever

N-API prebuilt binaries are distributed inside the npm package so on the most common platforms there is no compilation step nor an additional download to download the correct prebuilt binary.

With small buffers the pure JS implementation is probably faster. A user could do something like this websockets/ws#1348 after finding a good threshold.

@lpinca
Copy link
Member

lpinca commented Jun 8, 2019

System:
  OS: macOS 10.14.5
  CPU: (16) x64 Intel(R) Xeon(R) W-2140B CPU @ 3.20GHz
  Memory: 23.52 GB / 32.00 GB
  Shell: 3.2.57 - /bin/bash
Binaries:
  Node: 12.4.0 - /usr/local/bin/node
  npm: 6.9.0 - /usr/local/bin/npm

@shinnn
Copy link
Author

shinnn commented Jun 8, 2019

@lpinca Thanks for the detailed explanation. https://github.com/micnic/uv, especially micnic/uv#1 is an interesting topic for me.

Actually, Buffer.from('utf8'.repeat(100)) was not enough large to simulate a real-world UTF-8 file.

@shinnn shinnn closed this Jun 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants