Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a flag to determine wether a ReadableStream is disturbed or cancelled #1025

Closed
rumkin opened this issue Dec 15, 2019 · 18 comments
Closed

Comments

@rumkin
Copy link

rumkin commented Dec 15, 2019

Proposal

Implement flags to determine in a synchronous way wether the stream is disturbed and cancelled as boolean properties ReadableStream#disturbed and ReadableStream#closed.

Rationale

ReadableStream should have a flag which describes if the stream is disturbed or not. Without this flag it's just impossible to understand if the first chunk of the stream would be the first chunk of the data. That's why behaviors to check is stream has been read already is implemented in all major browsers (Chromium, Safari, Firefox) and is using by FetchAPI (Response and Request constructors).

I know there is an opinion against it. But, as I've already showed, without such flags it's impossible to implement reliable solution. Current design creates asynchronous undefined state and doesn't allow to understand wether the stream is correct constructor argument or not and lead to unpredictable delayed race conditions. As I understand such a solution made due to assumption that stream can be non-exclusively owned and as a result cancelled in any time, what seems incorrect.

Example

I'm implementing a server which works with Web StreamAPI. It requires an ability to check, if the stream contains acceptable body or not, to prevent throwing parsing errors on valid request and to detect race condition occurred in the process of request handling. Currently I solve this in a hacky way using lowlevel API from the Response object implementation, without this I will emit error each time someone else read the stream without being sure where in the code it has happened.

Solution with Response constructor:

function isStreamDisturbed(stream: ReadableStream): Boolean {
  try {   
      new Response(stream)
    
      return false
    }
    catch {
      return true
    }
}

const stream = new ReadableStream()
stream.cancel()

isStreamDisturbed(stream) // -> true

Of cause this solution is a hack and couldn't be decided as reliable.

@ricea
Copy link
Collaborator

ricea commented Dec 16, 2019

I am leaning towards adding isDisturbed (maybe just as disturbed?) but not isCanceled. We don't even track internally whether the stream was canceled. From the point of view of the consumer of the stream, it doesn't make a difference whether the stream was canceled or simply read until close.

@rumkin
Copy link
Author

rumkin commented Dec 16, 2019

@ricea Replaced with disturbed and closed.

@yutakahirano
Copy link
Member

disturbed is designed for Response.bodyUsed and I'm not sure if we should advertise its use widely. Personally I'm not a fan of the concept...

@rumkin
Copy link
Author

rumkin commented Dec 20, 2019

@yutakahirano It's not only used for Response.bodyUsed it's mainly used to understand if response body is valid at all. This is why Response's constructor fails and not just set bodyUsed in true when the stream has been already disturbed. Without this restriction such things like ServiceWorker's cache wouldn't work correctly.

@yutakahirano
Copy link
Member

disturbed represents whether the stream has ever been read. Validity is (or, at least, can be) a different concept I think.

@rumkin
Copy link
Author

rumkin commented Jan 7, 2020

But in the concept of Request/Response, if the stream is disturbed, then it's not a valid body. So here "disturbed" and "valid" are synonyms.

@jasnell
Copy link

jasnell commented Jun 29, 2021

After having just gone through the implementation for Node.js core, I'd definitely be +1 on a disturbed property on ReadableStream.

@benjamingr
Copy link
Member

@ricea @annevk would a spec PR adding a .disturbed property on ReadableStream be welcome :)?

@annevk
Copy link
Member

annevk commented Aug 2, 2021

What would the use case be? As discussed in #1105 it's rather specific to the Fetch APIs and unclear if we would do that again if starting fresh.

@benjamingr
Copy link
Member

@annevk honestly Node only needs it for the fetch API as well so far nodejs/node#39627

@ronag is there any other place we need it?

@ronag
Copy link

ronag commented Aug 2, 2021

@annevk honestly Node only needs it for the fetch API as well so far nodejs/node#39627

@ronag is there any other place we need it?

So far just for fetch.

@domenic
Copy link
Member

domenic commented Aug 2, 2021

Still -1 on this; probably it's best if we should close. Fetch is meant to be able to muck with the internals of ReadableStream anyway; keeping them independent is not a design goal.

@benjamingr
Copy link
Member

Yeah if this is just fetch and nothing else node can just “cheat”

@rumkin
Copy link
Author

rumkin commented Oct 18, 2021

It is cursed practice to produce cheating solutions. And TBH it's just strange to see proposals like this in a standard development body. WhatWG shouldn't lower the bar for itself. This behavior is decreasing reliability of Web Platform by several points and WhatWG IMO should earn points not to spend them. Transparent and sustainable standards are WhatWG's KPI and quality metrics.

Fetch and Stream APIs have been developed by WhatWG, but somehow become inconsistent and now require to be fixed. They throw errors in browsers and these errors aren't fully catchable due to web platform error messages fragmentation. Also there is some kind of undefined behavior in the current Stream design. Now it about to bring fragmentation into Node.js and embeddable solutions. And it will be hard to get rid of it.

There are several solutions which are properly developed, have better maintenance and could fix the issue:

  1. Add a flag to Stream instance.
  2. Add fetch.canConsumeStream() method.
  3. Create DisturbedStream error.
  4. Change Fetch API.

1. Add a flag to Stream instance

It's one of core feature for consequential data streams, to provide information about own state. And it should work like this:

  1. Developer checks stream's state with a flag !stream.disturbed && !stream.locked.
  2. Claim ownership with stream.getReader() or pass a stream into synchronous function which claims ownership.
  3. Start consuming data from a stream.

Such solution is universal and solve all possible errors and brings freedom to developers to implement any kind of behavior. Without such flag there is undefined state presented.

Pros

  1. Solves the issue with O(1) complexity as: !stream.isDisturbed.
  2. Solve undefined state issues.

Cons

  1. Requires standard modification.

2. Add fetch.canConsumeStream() method.

This os ok. In the case if the Fetch API won't change its behavior and will be the only API which requires disturbed flag. But in the case there will be more APIs which will use disturbed flag, there will be more such methods.

It's required for many APIs: web request and response handling, file reading and writing, video and audio processing.
So it's crucial to implement the first option.

Pros

  1. Solves the issue with O(1) complexity as: fetch.canConsume(stream).

Cons

  1. May require duplicate methods in other APIs in the future.

3. Create DisturbedStream error.

It solves potential errors in the future and lets platforms to decide how to implement Fetch API and wether to rely on disturbed flag or not. And let developers to handle the error with simplicity by its name.

Pros

  1. Solves the issue with O(1) complexity with:
        try {
           const req = new Request(url, {body: stream})
           // ...
        } catch (err) {
          if (err.name === 'DisturbedStream') {
            // ...
          }
       }

Cons

  1. Wordy and more resource-consuming.

4. Change Fetch API.

It won't be a breaking change, if Fetch API would start to accept disturbed streams. So in the case if such behavior is admissible, then this is the better option and the most simple solution for now.

Pros

  1. Easy to implement.
  2. Reduces ambiguity and difference in behavior.

Cons

  1. Doesn't seems to be proper in case of consequential streams.

@annevk
Copy link
Member

annevk commented Oct 18, 2021

Cheat was probably the wrong word as it's perfectly normal for trusted libraries to hook into internals. Having read your post you haven't really written down any use cases other than asserting there might be scenarios similar to fetch in the future. I think that means we're still stuck on #1025 (comment).

@domenic
Copy link
Member

domenic commented Oct 18, 2021

Yeah, good reminder to close this issue.

@domenic domenic closed this as completed Oct 18, 2021
@rumkin
Copy link
Author

rumkin commented Oct 18, 2021

It's normal for trusted libraries to hook into internals, it's not ok to make developer rely on some heuristics to determine what happened with the code.

Consistency and reduced complexity of web platform is itself the best use-case. But moreover current API has inconsistency in design and the first consumer of it – the Fetch API – is an indicator of the problem.

So developers couldn't check validity of streams in constructors, whaat will make API design unnecessary complex: developer should create a class instance, then run some async method to check if everything is fine and constructor arguments are valid.

And the main problem with this solution is redundant delays and memory usage. Here it is the solution by @domenic in node's repo:

async function isReadable(reader) {
  let readable = true;
  reader.closed.then(() => { readable = false; }, () => { readable = false; });
  await Promise.resolve();

  // If the stream was closed or errored then the closed promise would have settled by now and updated readable.
  // If not then it's readable.
  return readable;
}

It creates 3 promises and 3 jumps over event loop stack just to determine stream state. Moreover function scope with its boolean, function and one of this promises will live as long as stream is not closed and thus would hold memory. In the case of high loads, this will be memory pressure out of nowhere. This check should be made in a single synchronous call.

As a conclusion, current API design:
a. overcomplicated,
b. brings redundant load,
c. distribute complexity over multiple environments, which are Webkit, Firefox, Chromium, Node, Deno, embeddable JS and other JS environments.

So the question here is what the use case for making Stream status check asynchronous?

@domenic Please, reopen this issue. Also could you answer on the question above or provide a link where I could find such answer, because it seems like you are the one who's responsible for the API and all the decisions. I don't want to spend anyone time, but the issue isn't solved.

@domenic
Copy link
Member

domenic commented Oct 18, 2021

Nobody has provided use cases, beyond emulating a fetch API which we regret. We won't be reopening the issue until such use cases are provided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

8 participants