Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http.Parser would be great to have. #21202

Closed
qubyte opened this issue Jun 7, 2018 · 10 comments
Closed

http.Parser would be great to have. #21202

qubyte opened this issue Jun 7, 2018 · 10 comments
Labels
http Issues or PRs related to the http subsystem.

Comments

@qubyte
Copy link
Contributor

qubyte commented Jun 7, 2018

I've recently been doing some work in which I needed access to the individual requests and responses from a pair of streams (it's a proxy of sorts and I'm only working with HTTP/1.1). For requests, recent changes to http.Server have meant I can fool a server instance into parsing request objects from a stream. Responses have been more challenging...

As it stands I've had to make use of the http_parser binding and IncomingMessage.prototype._addHeaderLine, both of which are internals. In my quest to understand how to use the http_parser binding, I came across this comment:

// TODO: http.Parser should be a Writable emits request/response events.

This Parser would eliminate all the hacks and use of private API in my code!

Alas, the comment is rather old. It got me thinking about how it might be implemented though. My understanding is that Parser would have to be instantiated either for requests or responses since it would wrap an underlying HTTPParser instance. On that assumption, my questions are:

  • Does a response need knowledge of a request in some circumstances in order to be parsed (ruling out Parser in principle)?
  • Is this an intrinsically difficult task (the complexity in the HTTP related modules points to yes).
  • Would refactoring HTTP modules to use Parser lead to a nasty performance hit which my limited understanding has missed?
  • Would some effort to do such a refactor be welcomed?
@addaleax addaleax added the http Issues or PRs related to the http subsystem. label Jun 8, 2018
@addaleax
Copy link
Member

addaleax commented Jun 8, 2018

Does a response need knowledge of a request in some circumstances in order to be parsed (ruling out Parser in principle)?

No, I think we’re good here.

Is this an intrinsically difficult task (the complexity in the HTTP related modules points to yes).

Probably, yes.

Would refactoring HTTP modules to use Parser lead to a nasty performance hit which my limited understanding has missed?

If HTTP code would have to use an additional layer of streams (in particular of streams in the sense of the stream module, as opposed to the "fake" ones we currently provide), then probably yes.

It might be easier to do the reverse thing and provide an API (maybe as an npm module) on top of our current code? I think the createConnection option for http.request and emit('connection', socket) on the server side cover everything you’d need, right?

@qubyte
Copy link
Contributor Author

qubyte commented Jun 8, 2018

I think the createConnection option for http.request and emit('connection', socket) on the server side cover everything you’d need, right?

This is what I'm doing for requests from the client, but it doesn't cover reponses from the onward server (unless I've missed a trick). Until recently I didn't need access to individual responses and could pipe the onward server socket directly to the client socket. New requirements have meant I need access to individual responses now. I do have a solution which appears to work, but it's using private stuff and is pretty gnarly.

EDIT: I need to read more carefully! This looks promising:

I think the createConnection option for http.request

I'll definitely check that out.

@addaleax
Copy link
Member

addaleax commented Jun 8, 2018

So, the issue is that http.request assumes a 1:1 relationship between request and responses, but in your model requests and responses are not necessarily correlated that way? Am I understanding that correctly?

That doesn’t seem trivial to cover, yes…

@qubyte
Copy link
Contributor Author

qubyte commented Jun 8, 2018

Ah, so my use case is rather strange. Amongst other things, the proxy I'm building caches certain responses from the onward server (each keyed on the request from a client which led to it).

The need to build a key means parsing the request, and the need to cache the response means knowing where its boundaries are within the response stream (preferably more, but this would suffice). My crude first pass at this has at it's heart this code like:

// where:
// - chunk is a chunk from the response stream
// - message is an array
// - parser is an HTTPParser instance
function write(chunk) {
  for (const byte of chunk) {
    message.push(byte);
    parser.execute(Buffer.from([byte]));
  }
}

write is called each time I get a chunk from the onward server socket.

This is going to perform badly since it's working byte-by-byte, but I have the benefit of knowing that parser[HTTPParser.kOnMessageComplete] will be called upon the final byte of a response being handled, so I can Buffer.from(message) to get the entire message as a buffer. The message array is then reset to [] ready for the next response.

@bnoordhuis
Copy link
Member

I'd say http.Parser is unlikely because it requires us to expose/freeze too many internal details.

Why don't you use https://www.npmjs.com/package/http-parser-js? It follows the http-parser API closely and is not much slower.

(There's also https://github.com/bnoordhuis/node-http-parser but I haven't touched that in years and probably needs some work by now.)

@qubyte
Copy link
Contributor Author

qubyte commented Jun 8, 2018

That'd be a drop-in replacement, so definitely worth keeping in mind should the binding change and break my proxy.

This issue asked a few questions about the hypothetical http.Parser. I understand it to now be unlikely to be built, for good reasons. One thing I might wind up doing is wrapping http-parser-js myself to get something like Parser.

Given this outcome, should the linked comment in the code be updated or removed?

@bnoordhuis
Copy link
Member

That comment looks outdated to me. Do you want to file a pull request removing it?

@qubyte
Copy link
Contributor Author

qubyte commented Jun 8, 2018

No problem. Would you like me to remove the XXX comment above it too?

@bnoordhuis
Copy link
Member

Yep, that'd be good.

@qubyte
Copy link
Contributor Author

qubyte commented Jun 8, 2018

Thanks for the discussion folks! This was really interesting.

targos pushed a commit that referenced this issue Jun 11, 2018
Fixes: #21202

PR-URL: #21214
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Weijia Wang <starkwang@126.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
targos pushed a commit that referenced this issue Jun 13, 2018
Fixes: #21202

PR-URL: #21214
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Weijia Wang <starkwang@126.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
http Issues or PRs related to the http subsystem.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants