Attempt to fix #21 (intercept res.write / res.end) #22

magicmark · 2018-12-20T01:21:16Z

in #21, I added a failing test to show how this library doesn't work when apps call res.write / res.end

This is a stab at fixing this, and generalizing the way in which we intercept data passed to the response object. Before, we were just monkey patching res.json, and it's tempting to just extend this behaviour to res.write. However, the problems here are:

Having to deal with multiple ways to construct the responseBody (do we need to concat strings? buffers? just forward whatever was passed to res.json?)
By extending out current logic and similarly monkey patching res.write, we have to remember what was passed, and combine a chunks array - at this point, we're approaching reimplementing pipes, badly. Also potentially leaky :(

So I thought since res (http.ServerResponse) is a Writable Stream, let's treat is as such and try and tap into the stream to get the response body.

This kind of works! Sort of!

So the failing test now works....but now I broke all the others :P

when debugging, test output now looks like this:

  console.log __tests__/index.test.js:19
    responseBody is  {"foo":"bar"}{"foo":"bar"}

  console.log __tests__/index.test.js:19
    responseBody is  foofoo

  console.log __tests__/index.test.js:19
    responseBody is  {"foo":"bar"}{"foo":"bar"}

~~What's now happening is that apparently res.json() does indeed call res.write() twice~~

I'll look for a way around this tomorrow but just wanted to give an early preview of what I'm up to.

Other thoughts:

This is all in an attempt to not have to deal with polymorphism, worrying about types, strings vs buffers etc. This logic already exists and is battle tested in this library that we're now using - https://github.com/sindresorhus/get-stream/blob/master/buffer-stream.js. Let's add more tests and confirm that this handles different types as expected.

magicmark · 2018-12-24T05:35:49Z

Wow, I've learned a lot whilst debugging this!

So the root cause can be boiled down to this minimal repro:

const { PassThrough, Writable } = require("stream");

const writeableStream = new Writable({
  write: (chunk, encoding, next) => {
    console.log("[write stream]: ", chunk.toString());
    next();
  }
});

const oldWrite = writeableStream.write.bind(writeableStream);
const oldEnd = writeableStream.end.bind(writeableStream);

writeableStream.write = function PatchedWrite(...args) {
  console.log("writeableStream write called with: ", args);
  return oldWrite(...args);
};

writeableStream.end = function PatchedEnd(...args) {
  console.log("writeableStream end called with: ", args);
  return oldEnd(...args);
};

// Write some stuff:
writeableStream.write("Hello ");
writeableStream.end("World!");

Which produces the following output:

writeableStream write called with:  [ 'Hello ' ]
[write stream]:  Hello
writeableStream end called with:  [ 'World!' ]
writeableStream write called with:  [ 'World!', undefined ]
[write stream]:  World!

We can see that even though our application code only calls write() once, internally, write() gets called again from end() (as seen here)

This leads to an edge case where when application code calls end(), the last chunk gets reported twice.

This is trickier than I though, will have to do some more thinking :/

cc @briancavilier

magicmark · 2018-12-24T08:39:44Z

Got a working branch that passes all the tests 🎉

Some thoughts/realisations after having gone through this exercise:

Non-blocking thoughts:

monkey patching is really really gross and this logic could be hard to unwind
I guess we're subverting the whole point of using pipes by storing the whole response in memory - probably fine for most applications, but probably worth highlighting this point in big bold letters at the top of the readme somewhere.
edge case error handling is a bit of a question mark - if there's an error during the getStream promise, that currently won't get propagated - the middleware will return an empty response body. If this happens, there's probably a bigger error that will have blown up somewhere else, but I'd like to get coverage on this regardless / handle this properly
currently we're relying on the magic string/buffer type detection of the getStream library. I'm not sure how to correctly type the RequestDetails object. Left as any for now.

Blocking thoughts:

I'd like to find a better way of "remembering" if we should call the passThru methods or the real methods - the current logic feels a bit hacky
In fact, there's an edge case I noticed I introduced - if you do a deferred write() before end(), we now swallow the error and silently allow it.

e.g. this should be a write after end error, but we now happily accept this:

app.get("/test", (req, res) => {
  res.write("Hello ");
  res.write("World ");
  process.nextTick(() => {
    res.write("sneaking in");
  });
  res.end("!");
});

// Outputs "Hello World !sneaking in" to the browser
// Middleware just returns "Hello World !"

Fixing (5) will fix (6). It's unclear how much of a problem IRL this would be, but I'm still not hugely comfortable releasing this branch in its current state. Maybe it's fine, and we can ticket these things up and fix later. (Leaning towards shipping as an alpha - it's still an improvement over the current broken version :P)

Would love any extra thoughts on this!

briancavalier · 2019-01-04T01:45:23Z

Thanks for the invite, @magicmark :)

I had a couple quick thoughts. The first may be more of a breaking change, and I'm not sure if it fits with the goals. The second may be closer to this PR.

1 - Use stream instead of promise

This idea breaks the tween() API, but I think it could be interesting as it fits with a streaming model and may help avoid having to materialize the entire response body in memory: Change the tween() API to a accept Readable instead of Promise. Then, use a technique like this to pass a "copy" of the response stream to each tween. Caveat: I don't know the performance characteristics of this kind of stream fan-out.

Overall, this feels much simpler, but again, it may not meet the goals/constraints (esp since I'm a total noob here!).

2 - Simplify stream accumulation

It might be simpler to use something like bl or concat-stream to accumulate chunks directly rather than a tricky circular pipe. Here's a hastily thrown together sketch:

import BufferList from 'bl' // or concat-stream

//...
  let resolver
  const responseDetails = new Promise((resolve, reject) => {
    resolver = { resolve, reject }
  })

  const buffer = new BufferList()
  const origWrite = res.write.bind(res);
  const origEnd = res.end.bind(res);
  
  res.write = function(chunk, ...rest) {
    bl.append(chunk)
    return origWrite(chunk, ...rest)
  }

  res.end = function(...args) {
    const result = origEnd(...args)
    resolver.resolve(buffer)
    return result
  }

//...
}

Any thoughts on those ideas? Do they spark any new ones?

magicmark · 2019-01-09T23:59:01Z

@briancavalier thanks so much for your thoughts!

1

I originally tried a version of this, but I'm not sure how to make it work without monkey patching over res :/

To see why, a naive version looks like this

// in some middleware, set up the pass through pipe to be read from later...
app.use((req, res, next) => {
    const passThru = new PassThrough();
    res.pipe(passThru);
    next();
});

This won't work though! That's because res is a Writable Stream - a thing to be written to, not read from. i.e. we can't actually call .pipe() on it (.pipe is not an available method on Writable Streams)

https://github.com/nodejs/node/blob/d4c91f28148af8a6c1a95392e5c88cb93d4b61c6/lib/_http_outgoing.js#L833

I found this a really helpful way to think about it: https://github.com/substack/stream-handbook#pipe

a.pipe(b);
b.pipe(c);
c.pipe(d);
This is very much like what you might do on the command-line to pipe programs together:
a | b | c | d

Ok, so let's try this instead:

// in some middleware, set up the pass through pipe to be read from later...
app.use((req, res, next) => {
    const passThru = new PassThrough();
    passThru.pipe(res);
    next();
});

This allows us to tap into .on('data'), but for data to flow through the passthrough pipe, we'd need to make application code call passThru.end instead of res.end etc somehow. Hence the monkeypatching i was talking about earlier :(

It's possible i'm missing something, but I'm not sure how to implement (1) in a nice way :/

2

I think this is probably a better approach than my draft. I had considered directly using concat-stream, but got scared by this comment about it in the README of get-stream: https://github.com/sindresorhus/get-stream/blob/master/readme.md#how-is-this-different-from-concat-stream

I was also keen to avoid directly managing the response object for fear of messing up types, and instead relying on the get-stream library to do that. But it seems like maybe we can use BufferList as per your implementation and let the application code worry about that, so we can avoid it that way?

I'll work on another draft of this using this approach as you suggested.

Thanks so much again for taking the time to look into this!

magicmark · 2019-01-12T00:17:04Z

@briancavalier thanks a bunch again for the guidance. Updated the CR to use the second approach. I think this feels a lot nicer, and maintains the current API.

briancavalier · 2019-01-16T01:51:26Z

The latest looks really simple and clean, @magicmark. Sorry for the readable vs writable confusion above in option 1. I wanted to write down what we talked about afterward, though, so it doesn't get lost. It could be an interesting future direction.

According to bl's docs, it's a duplex stream. It can be used to transform a writable into a readable (via buffering). That could be used to implement a solution like option 1. In terms of efficiency, depending on bl's implementation, the worst case should be no worse than accumulating all the output in memory, and the best possible case may approach streaming. Reality is probably somewhere in between. Anyway, maybe worth exploring.

Cheers!

magicmark · 2019-01-16T18:21:01Z

@briancavalier \o/ awesome thanks again so much for taking the time on this!

I really like the idea of exposing the response as a readable stream, and keeping it "nodey".

In the interest of expediency, I'm going to merge this as is, and as a follow up, allow an option to change the API to return a stream - maybe this is done in two parts or something.

LMK if you have any concerns with this - thanks again!

briancavalier · 2019-01-16T19:34:30Z

Shipping the current approach as the short term sounds great 👍

magicmark force-pushed the fix_res_write branch 2 times, most recently from 70b72af to f12ba95 Compare December 24, 2018 08:15

Store chunks passed to res.write/end in memory - fixes #21

4677645

magicmark force-pushed the fix_res_write branch from f12ba95 to 4677645 Compare January 12, 2019 00:14

magicmark merged commit 045d010 into master Jan 16, 2019

magicmark deleted the fix_res_write branch January 16, 2019 18:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempt to fix #21 (intercept res.write / res.end) #22

Attempt to fix #21 (intercept res.write / res.end) #22

magicmark commented Dec 20, 2018 •

edited

Loading

magicmark commented Dec 24, 2018 •

edited

Loading

magicmark commented Dec 24, 2018 •

edited

Loading

briancavalier commented Jan 4, 2019

magicmark commented Jan 9, 2019 •

edited

Loading

magicmark commented Jan 12, 2019

briancavalier commented Jan 16, 2019

magicmark commented Jan 16, 2019

briancavalier commented Jan 16, 2019

Attempt to fix #21 (intercept res.write / res.end) #22

Attempt to fix #21 (intercept res.write / res.end) #22

Conversation

magicmark commented Dec 20, 2018 • edited Loading

magicmark commented Dec 24, 2018 • edited Loading

magicmark commented Dec 24, 2018 • edited Loading

briancavalier commented Jan 4, 2019

1 - Use stream instead of promise

2 - Simplify stream accumulation

magicmark commented Jan 9, 2019 • edited Loading

1

2

magicmark commented Jan 12, 2019

briancavalier commented Jan 16, 2019

magicmark commented Jan 16, 2019

briancavalier commented Jan 16, 2019

magicmark commented Dec 20, 2018 •

edited

Loading

magicmark commented Dec 24, 2018 •

edited

Loading

magicmark commented Dec 24, 2018 •

edited

Loading

magicmark commented Jan 9, 2019 •

edited

Loading