Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use llhttp through wasm #22

Closed
ronag opened this issue May 2, 2020 · 45 comments
Closed

Use llhttp through wasm #22

ronag opened this issue May 2, 2020 · 45 comments
Labels
enhancement New feature or request Status: help-wanted This issue/pr is open for contributions

Comments

@ronag
Copy link
Member

ronag commented May 2, 2020

Would it be possible and make sense to use https://github.com/nodejs/llhttp instead of http-parser-js? Maybe as an opt-in?

@ronag
Copy link
Member Author

ronag commented May 2, 2020

The performance of http-parser-js came up as part of a discussion regarding this library. Hence this issue.

@mcollina
Copy link
Member

mcollina commented May 2, 2020

I’d be happy to, however llhttp is not part of the public API, and I would prefer not to use a binary addon.

Wasm could be cool, but I do not know how it would perform. On top of of that, all of this inlines and it avoids one or more jump between the http parser (C) and client (JS). Tbh, I don’t expect a bump in speed, but I would be happy to be surprised.

@szmarczak
Copy link
Member

szmarczak commented May 3, 2020

I have to say I'm impressed. Even with http-parser-js it's still faster than the native Node.js http.

undici x 20,826 ops/sec ±1.70% (87 runs sampled)
Fastest is undici
Benchmark source code
'use strict';
const Benchmark = require('benchmark');
const {Client} = require('undici');

const suite = new Benchmark.Suite();

const client = new Client('https://localhost:8081', {
    tls: {
rejectUnauthorized: false
    }
});

const options = {
    path: '/',
    method: 'GET'
};

// Benchmarking
suite.add('undici', {
    defer: true,
    fn: async deferred => {
        const {body} = await client.request(options);
        body.resume();

        body.once('end', () => {
            deferred.resolve();
        });
    }
}).on('cycle', event => {
    console.log(String(event.target));
}).on('complete', function () {
    console.log(`Fastest is ${this.filter('fastest').map('name')}`);
}).run();

The fun fact is that it's even faster than the native HTTP2 implementation:

http2 x 16,172 ops/sec ±1.21% (85 runs sampled)
https - keepalive x 13,158 ops/sec ±2.88% (78 runs sampled)

@ronag ronag added the enhancement New feature or request label May 5, 2020
@ronag
Copy link
Member Author

ronag commented May 13, 2020

Anyone of you know of anyone that might be interested in creating or helping out with a wasm build of llhttp that can be used as an alternative to http-parser-js? I hope it would not be too difficult if one is familiar with the wasm build flow.

@mcollina
Copy link
Member

In would recommend to open an issue on https://github.com/nodejs/llhttp, there is also https://www.npmjs.com/package/llhttp which might be usable somehow?

@ronag
Copy link
Member Author

ronag commented May 13, 2020

@szmarczak Would you mind running those benchmarks again against master? I'm curious how much of a difference recent improvements have made.

EDIT: even better if you use request.stream #109

@szmarczak
Copy link
Member

master: undici x 21,168 ops/sec ±2.00% (88 runs sampled)

@szmarczak
Copy link
Member

#109 undici x 20,131 ops/sec ±1.81% (85 runs sampled)

but it very often goes below 20k: undici x 19,812 ops/sec ±2.01% (85 runs sampled)

@szmarczak
Copy link
Member

ah wait I didn't use the newer API, lemme check it out

@szmarczak
Copy link
Member

        client.stream(options, () => {
            const obj = {
                on: () => obj,
                write: () => {},
                end: () => deferred.resolve()
            };

            return obj;
        });
undici x 24,184 ops/sec ±1.64% (85 runs sampled)
        client.stream(options, () => {
            let buffered = '';

            const obj = {
                on: () => obj,
                write: chunk => {
                    buffered += chunk;
                },
                end: () => deferred.resolve()
            };

            return obj;
        });
undici x 23,080 ops/sec ±1.50% (83 runs sampled)

That's a big improvement!

@ronag ronag changed the title llhttp? Use llhttp through wasm May 16, 2020
@ronag ronag added the Status: help-wanted This issue/pr is open for contributions label May 18, 2020
@dnlup
Copy link
Contributor

dnlup commented Jun 26, 2020

I did a quick test and the package https://www.npmjs.com/package/llhttp can be used directly, so it seems wasm is not necessary.
A lot of tests fail, I would like to dig deeper to check if it is something that can be solved by just refactoring or there are some major incompatibilities.

client-base

...
const { HTTP: HTTParser } = require('llhttp')
...

Note: I don't know how much this task would take atm

@dnlup
Copy link
Contributor

dnlup commented Jun 26, 2020

After some research I found this issue which seems to suggest a way to use this parser through Node directly, which I think it could be a better option.

I did try those steps some time ago and faced the same errors.

@mcollina , when you say

I’d be happy to, however llhttp is not part of the public API,

You mean you wouldn't like to try the approach suggested in the issue?

ronag added a commit that referenced this issue Jun 26, 2020
ronag added a commit that referenced this issue Jun 26, 2020
ronag added a commit that referenced this issue Jun 26, 2020
ronag added a commit that referenced this issue Jun 27, 2020
ronag added a commit that referenced this issue Jun 27, 2020
ronag added a commit that referenced this issue Jun 27, 2020
ronag added a commit that referenced this issue Jun 27, 2020
ronag added a commit that referenced this issue Jun 27, 2020
ronag added a commit that referenced this issue Jun 27, 2020
ronag added a commit that referenced this issue Jun 27, 2020
@mcollina
Copy link
Member

I think using process.bindings() is highly unsupported from the Node.js team. It could work but it would require more maintainance.

@ronag
Copy link
Member Author

ronag commented Jun 27, 2020

I think using process.bindings() is highly unsupported from the Node.js team.

Yes.

It could work but it would require more maintainance.

Yes, though I don't see anyway good alternative at the moment. Unless someone starts investigating wasm or napi (I'm unfortunately not familiar with either).

I think it might be worth it. We would need to keep an eye on major releases of Node though to make sure it doesn't break.

I mean, we could stick with http-parser-js. The performance difference is not significant. I'm mostly concerned with the security/compliance part at this point.

@mcollina
Copy link
Member

I'm good with the approach.

@ronag
Copy link
Member Author

ronag commented Jun 28, 2020

Fixed in 2240cb0 (although not using wasm).

@ronag ronag closed this as completed Jun 28, 2020
@devsnek
Copy link
Member

devsnek commented Feb 24, 2021

I think technically you could built it without the wasi by manually including libc headers since it doesn't use any system apis, but imo its easier just to use the sdk anyway.

@mcollina
Copy link
Member

Thanks!

@mcollina
Copy link
Member

I did some experiments... using the wasi-sdk works like magic, thanks @devsnek!

From the point of view of Undici, we are missing some key things that are provided by https://github.com/nodejs/node/blob/master/src/node_http_parser.cc, hooks into the parsing flow:

https://github.com/nodejs/node/blob/76a073b67ed94a38ab3efa9cf9b2030962cb4d08/src/node_http_parser.cc#L966-L979.

This is doable with much more work apparently.

@dnlup
Copy link
Contributor

dnlup commented Feb 24, 2021

I did some experiments... using the wasi-sdk works like magic, thanks @devsnek!

From the point of view of Undici, we are missing some key things that are provided by https://github.com/nodejs/node/blob/master/src/node_http_parser.cc, hooks into the parsing flow:

https://github.com/nodejs/node/blob/76a073b67ed94a38ab3efa9cf9b2030962cb4d08/src/node_http_parser.cc#L966-L979.

This is doable with much more work apparently.

Is it, hopefully, a matter of copy/paste the bits of Node into llhttp?

@devsnek
Copy link
Member

devsnek commented Feb 24, 2021

@ronag
Copy link
Member Author

ronag commented Feb 24, 2021

I'm blocked. No idea how to build https://github.com/devsnek/llhttp/tree/wasm with https://github.com/WebAssembly/wasi-sdk

Tried everything I can think of to make clang find the correct include path without success:

/build/c/llhttp.c:7574:10: fatal error: 'stdlib.h' file not found
#include <stdlib.h>
         ^~~~~~~~~~
1 error generated.
./src/native/api.c:1:10: fatal error: 'stdlib.h' file not found
#include <stdlib.h>
         ^~~~~~~~~~
1 error generated.
./src/native/http.c:1:10: fatal error: 'stdio.h' file not found
#include <stdio.h>
         ^~~~~~~~~
1 error generated.

I guess UNIX tooling is not my strongest skill.

e.g.

/Users/ronagy/GitHub/public/devsnek-llhttp/wasi-sdk-12.0/bin/clang -I /Users/ronagy/GitHub/public/devsnek-llhttp/wasi-sdk-12.0/lib/clang/11.0.0/include     --sysroot=$WASI_ROOT/share/wasi-sysroot   -target wasm32-unknown-wasi   -Ofast   -fno-exceptions   -fvisibility=hidden   -mexec-model=reactor   -Wl,-error-limit=0   -Wl,-O3   -Wl,--lto-O3   -Wl,--strip-all   -Wl,--allow-undefined   -Wl,--export-dynamic   -Wl,--export-table   -Wl,--export=malloc   -Wl,--export=free   ./build/c/*.c   ./src/native/*.c   -I./build   -o ./build/llhttp.wasm

@dnlup
Copy link
Contributor

dnlup commented Feb 25, 2021

Just thinking out loud here:

to define kOnMessageBegin we should add a wasm_on_message_begin in the parser settings in src/native/api.c. That would be similar to what is done in Node http_parser.cc with on_message_begin. The hook using kOnMessageBegin would then be finalized/implemented in the JS implementation of the Parser. This implementation will substitute the HTTP_Parser we are currently using in core here.

I am trying to play with it using the @devsnek branch (thank you). I apologize if this is stuff known already, I am new to wasi.

@dnlup
Copy link
Contributor

dnlup commented Feb 25, 2021

I'm blocked. No idea how to build https://github.com/devsnek/llhttp/tree/wasm with https://github.com/WebAssembly/wasi-sdk

Tried everything I can think of to make clang find the correct include path without success:

/build/c/llhttp.c:7574:10: fatal error: 'stdlib.h' file not found
#include <stdlib.h>
         ^~~~~~~~~~
1 error generated.
./src/native/api.c:1:10: fatal error: 'stdlib.h' file not found
#include <stdlib.h>
         ^~~~~~~~~~
1 error generated.
./src/native/http.c:1:10: fatal error: 'stdio.h' file not found
#include <stdio.h>
         ^~~~~~~~~
1 error generated.

I guess UNIX tooling is not my strongest skill.

e.g.

/Users/ronagy/GitHub/public/devsnek-llhttp/wasi-sdk-12.0/bin/clang -I /Users/ronagy/GitHub/public/devsnek-llhttp/wasi-sdk-12.0/lib/clang/11.0.0/include     --sysroot=$WASI_ROOT/share/wasi-sysroot   -target wasm32-unknown-wasi   -Ofast   -fno-exceptions   -fvisibility=hidden   -mexec-model=reactor   -Wl,-error-limit=0   -Wl,-O3   -Wl,--lto-O3   -Wl,--strip-all   -Wl,--allow-undefined   -Wl,--export-dynamic   -Wl,--export-table   -Wl,--export=malloc   -Wl,--export=free   ./build/c/*.c   ./src/native/*.c   -I./build   -o ./build/llhttp.wasm

@ronag The way that works for me is downloading a tar.gz release, extract it somewhere, then launch WASI_ROOT=<path-to-wasm-sdk> ./build_wasm.sh. I am using Ubuntu.

@dnlup
Copy link
Contributor

dnlup commented Feb 25, 2021

Also, consume/uncosume would have to be implemented using streams on the JS side, right?

@ronag
Copy link
Member Author

ronag commented Feb 25, 2021

Also, consume/uncosume would have to be implemented using streams on the JS side, right?

Yea. We need to remove consume/unconsume and just use socket.on('data', buf => parser.execute(buf))

So we go from:

cpp (socket + parser) -> js (undici)

to:

cpp (socket) -> js (undici) -> wasm (parser) -> js (undici)

@dnlup
Copy link
Contributor

dnlup commented Feb 25, 2021

Also, consume/uncosume would have to be implemented using streams on the JS side, right?

Yea. We need to remove consume/unconsume and just use socket.on('data', buf => parser.execute(buf))

So we go from:

cpp (socket + parser) -> js (undici)

to:

cpp (socket) -> js (undici) -> wasm (parser) -> js (undici)

Does that mean that kOnExecute is no longer necessary?

@ronag
Copy link
Member Author

ronag commented Feb 25, 2021

Does that mean that kOnExecute is no longer necessary?

Correct. That logic should live after const ret = parser->execute(buf).

@dnlup
Copy link
Contributor

dnlup commented Feb 25, 2021

I managed to put together a small experiment for a parser in https://github.com/dnlup/llhttp/blob/undici_wasm/wasm.js. Besides the things we already mentioned. like:

  • kOnExecute not being necessary anymore
  • we will handle the stream logic in js

One other thing I am seeing is the headers timeout, the headers max size & number. Do we need all of them implemented in the base parser?

What strategy do you think is best for embedding the parser in undici? Should it live in a separate repo? Or should we include it here?

@ronag
Copy link
Member Author

ronag commented Feb 25, 2021

headersTimeout not necessary, we already have it in ja I think, we can implement max size in js.

@ronag
Copy link
Member Author

ronag commented Feb 26, 2021

@dnlup at you working on this with intention to open PR?

@mcollina
Copy link
Member

What strategy do you think is best for embedding the parser in undici? Should it live in a separate repo? Or should we include it here?

it would be best if undici kept having no runtime dependencies. A contribution in the form of a script that generates the relative javascript and wasm files and populate a folder in this repo would be rad.

(Bonus point if the script is run within a docker image, so the toolchain to generate those files is super easy).

@dnlup
Copy link
Contributor

dnlup commented Feb 26, 2021

@dnlup at you working on this with intention to open PR?

Yes. There's a lot of new stuff for me here, so I might not be fast, but I hope to complete it. Anyway, I'd be happy to help with this issue.

@dnlup
Copy link
Contributor

dnlup commented Feb 26, 2021

What strategy do you think is best for embedding the parser in undici? Should it live in a separate repo? Or should we include it here?

it would be best if undici kept having no runtime dependencies. A contribution in the form of a script that generates the relative javascript and wasm files and populate a folder in this repo would be rad.

(Bonus point if the script is run within a docker image, so the toolchain to generate those files is super easy).

Ok, just to check I understand correctly, the best approach would be:

  • setting up a docker container
  • from the container, we clone/download the llhttp project (do we use a custom repo or branch?)
  • launch the script to generate the wasm build
  • copy the build assets into undici

This script would also have to work locally. Otherwise, there would be no way to tun undici

@mcollina
Copy link
Member

from the container, we clone/download the llhttp project (do we use a custom repo or branch?)

I would just clone master and add our own files on top (if we need them)

This script would also have to work locally. Otherwise, there would be no way to tun undici

I don't understand this.

@dnlup
Copy link
Contributor

dnlup commented Feb 26, 2021

I don't understand this.

Sorry.

I thought that the script should have to run on the CI and locally (while working on your machine). But now that I think about it, maybe the easiest solution, for now, would be just to run it locally. We need to be able to build the parser to use undici.

@mcollina
Copy link
Member

I thought that the script should have to run on the CI and locally (while working on your machine). But now that I think about it, maybe the easiest solution, for now, would be just to run it locally. We need to be able to build the parser to use undici.

A local run (via docker) would be fine... we'll commit the output to this repo.

This was referenced Mar 7, 2021
@mcollina
Copy link
Member

This was done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Status: help-wanted This issue/pr is open for contributions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants