Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to fix #21 (intercept res.write / res.end) #22

Merged
merged 1 commit into from
Jan 16, 2019

Conversation

magicmark
Copy link
Contributor

@magicmark magicmark commented Dec 20, 2018

in #21, I added a failing test to show how this library doesn't work when apps call res.write / res.end

This is a stab at fixing this, and generalizing the way in which we intercept data passed to the response object. Before, we were just monkey patching res.json, and it's tempting to just extend this behaviour to res.write. However, the problems here are:

  • Having to deal with multiple ways to construct the responseBody (do we need to concat strings? buffers? just forward whatever was passed to res.json?)
  • By extending out current logic and similarly monkey patching res.write, we have to remember what was passed, and combine a chunks array - at this point, we're approaching reimplementing pipes, badly. Also potentially leaky :(

So I thought since res (http.ServerResponse) is a Writable Stream, let's treat is as such and try and tap into the stream to get the response body.

This kind of works! Sort of!

So the failing test now works....but now I broke all the others :P

when debugging, test output now looks like this:

  console.log __tests__/index.test.js:19
    responseBody is  {"foo":"bar"}{"foo":"bar"}

  console.log __tests__/index.test.js:19
    responseBody is  foofoo

  console.log __tests__/index.test.js:19
    responseBody is  {"foo":"bar"}{"foo":"bar"}

What's now happening is that apparently res.json() does indeed call res.write() twice

I'll look for a way around this tomorrow but just wanted to give an early preview of what I'm up to.

Other thoughts:

@magicmark
Copy link
Contributor Author

magicmark commented Dec 24, 2018

Wow, I've learned a lot whilst debugging this!

So the root cause can be boiled down to this minimal repro:

const { PassThrough, Writable } = require("stream");

const writeableStream = new Writable({
  write: (chunk, encoding, next) => {
    console.log("[write stream]: ", chunk.toString());
    next();
  }
});

const oldWrite = writeableStream.write.bind(writeableStream);
const oldEnd = writeableStream.end.bind(writeableStream);

writeableStream.write = function PatchedWrite(...args) {
  console.log("writeableStream write called with: ", args);
  return oldWrite(...args);
};

writeableStream.end = function PatchedEnd(...args) {
  console.log("writeableStream end called with: ", args);
  return oldEnd(...args);
};

// Write some stuff:
writeableStream.write("Hello ");
writeableStream.end("World!");

Which produces the following output:

writeableStream write called with:  [ 'Hello ' ]
[write stream]:  Hello
writeableStream end called with:  [ 'World!' ]
writeableStream write called with:  [ 'World!', undefined ]
[write stream]:  World!

We can see that even though our application code only calls write() once, internally, write() gets called again from end() (as seen here)

This leads to an edge case where when application code calls end(), the last chunk gets reported twice.

This is trickier than I though, will have to do some more thinking :/

cc @briancavilier

@magicmark magicmark force-pushed the fix_res_write branch 2 times, most recently from 70b72af to f12ba95 Compare December 24, 2018 08:15
@magicmark
Copy link
Contributor Author

magicmark commented Dec 24, 2018

Got a working branch that passes all the tests 🎉

Some thoughts/realisations after having gone through this exercise:

Non-blocking thoughts:

  1. monkey patching is really really gross and this logic could be hard to unwind
  2. I guess we're subverting the whole point of using pipes by storing the whole response in memory - probably fine for most applications, but probably worth highlighting this point in big bold letters at the top of the readme somewhere.
  3. edge case error handling is a bit of a question mark - if there's an error during the getStream promise, that currently won't get propagated - the middleware will return an empty response body. If this happens, there's probably a bigger error that will have blown up somewhere else, but I'd like to get coverage on this regardless / handle this properly
  4. currently we're relying on the magic string/buffer type detection of the getStream library. I'm not sure how to correctly type the RequestDetails object. Left as any for now.

Blocking thoughts:

  1. I'd like to find a better way of "remembering" if we should call the passThru methods or the real methods - the current logic feels a bit hacky
  2. In fact, there's an edge case I noticed I introduced - if you do a deferred write() before end(), we now swallow the error and silently allow it.

e.g. this should be a write after end error, but we now happily accept this:

app.get("/test", (req, res) => {
  res.write("Hello ");
  res.write("World ");
  process.nextTick(() => {
    res.write("sneaking in");
  });
  res.end("!");
});

// Outputs "Hello World !sneaking in" to the browser
// Middleware just returns "Hello World !"

Fixing (5) will fix (6). It's unclear how much of a problem IRL this would be, but I'm still not hugely comfortable releasing this branch in its current state. Maybe it's fine, and we can ticket these things up and fix later. (Leaning towards shipping as an alpha - it's still an improvement over the current broken version :P)

Would love any extra thoughts on this!

@briancavalier
Copy link
Collaborator

Thanks for the invite, @magicmark :)

I had a couple quick thoughts. The first may be more of a breaking change, and I'm not sure if it fits with the goals. The second may be closer to this PR.

1 - Use stream instead of promise

This idea breaks the tween() API, but I think it could be interesting as it fits with a streaming model and may help avoid having to materialize the entire response body in memory: Change the tween() API to a accept Readable instead of Promise. Then, use a technique like this to pass a "copy" of the response stream to each tween. Caveat: I don't know the performance characteristics of this kind of stream fan-out.

Overall, this feels much simpler, but again, it may not meet the goals/constraints (esp since I'm a total noob here!).

2 - Simplify stream accumulation

It might be simpler to use something like bl or concat-stream to accumulate chunks directly rather than a tricky circular pipe. Here's a hastily thrown together sketch:

import BufferList from 'bl' // or concat-stream

//...
  let resolver
  const responseDetails = new Promise((resolve, reject) => {
    resolver = { resolve, reject }
  })

  const buffer = new BufferList()
  const origWrite = res.write.bind(res);
  const origEnd = res.end.bind(res);
  
  res.write = function(chunk, ...rest) {
    bl.append(chunk)
    return origWrite(chunk, ...rest)
  }

  res.end = function(...args) {
    const result = origEnd(...args)
    resolver.resolve(buffer)
    return result
  }

//...
}

Any thoughts on those ideas? Do they spark any new ones?

@magicmark
Copy link
Contributor Author

magicmark commented Jan 9, 2019

@briancavalier thanks so much for your thoughts!

1

I originally tried a version of this, but I'm not sure how to make it work without monkey patching over res :/

To see why, a naive version looks like this

// in some middleware, set up the pass through pipe to be read from later...
app.use((req, res, next) => {
    const passThru = new PassThrough();
    res.pipe(passThru);
    next();
});

This won't work though! That's because res is a Writable Stream - a thing to be written to, not read from. i.e. we can't actually call .pipe() on it (.pipe is not an available method on Writable Streams)

https://github.com/nodejs/node/blob/d4c91f28148af8a6c1a95392e5c88cb93d4b61c6/lib/_http_outgoing.js#L833

I found this a really helpful way to think about it: https://github.com/substack/stream-handbook#pipe

a.pipe(b);
b.pipe(c);
c.pipe(d);
This is very much like what you might do on the command-line to pipe programs together:
a | b | c | d

Ok, so let's try this instead:

// in some middleware, set up the pass through pipe to be read from later...
app.use((req, res, next) => {
    const passThru = new PassThrough();
    passThru.pipe(res);
    next();
});

This allows us to tap into .on('data'), but for data to flow through the passthrough pipe, we'd need to make application code call passThru.end instead of res.end etc somehow. Hence the monkeypatching i was talking about earlier :(

It's possible i'm missing something, but I'm not sure how to implement (1) in a nice way :/

2

I think this is probably a better approach than my draft. I had considered directly using concat-stream, but got scared by this comment about it in the README of get-stream: https://github.com/sindresorhus/get-stream/blob/master/readme.md#how-is-this-different-from-concat-stream

I was also keen to avoid directly managing the response object for fear of messing up types, and instead relying on the get-stream library to do that. But it seems like maybe we can use BufferList as per your implementation and let the application code worry about that, so we can avoid it that way?

I'll work on another draft of this using this approach as you suggested.

Thanks so much again for taking the time to look into this!

@magicmark
Copy link
Contributor Author

@briancavalier thanks a bunch again for the guidance. Updated the CR to use the second approach. I think this feels a lot nicer, and maintains the current API.

@briancavalier
Copy link
Collaborator

The latest looks really simple and clean, @magicmark. Sorry for the readable vs writable confusion above in option 1. I wanted to write down what we talked about afterward, though, so it doesn't get lost. It could be an interesting future direction.

According to bl's docs, it's a duplex stream. It can be used to transform a writable into a readable (via buffering). That could be used to implement a solution like option 1. In terms of efficiency, depending on bl's implementation, the worst case should be no worse than accumulating all the output in memory, and the best possible case may approach streaming. Reality is probably somewhere in between. Anyway, maybe worth exploring.

Cheers!

@magicmark
Copy link
Contributor Author

@briancavalier \o/ awesome thanks again so much for taking the time on this!

I really like the idea of exposing the response as a readable stream, and keeping it "nodey".

In the interest of expediency, I'm going to merge this as is, and as a follow up, allow an option to change the API to return a stream - maybe this is done in two parts or something.

LMK if you have any concerns with this - thanks again!

@magicmark magicmark merged commit 045d010 into master Jan 16, 2019
@magicmark magicmark deleted the fix_res_write branch January 16, 2019 18:21
@briancavalier
Copy link
Collaborator

Shipping the current approach as the short term sounds great 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants