Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define when we consider a data is consumed by the reader #76

Closed
tyoshino opened this issue Feb 10, 2014 · 5 comments
Closed

Define when we consider a data is consumed by the reader #76

tyoshino opened this issue Feb 10, 2014 · 5 comments

Comments

@tyoshino
Copy link
Member

The consumer code touching the ReadableStream API cannot obtain fine-grained control over how much data to pull. It's hidden. So, in case, big chunks are used by the data source, the consumer code may get unexpectedly big chunk from a read() call. It's ok to get a big chunk, but I don't want the ReadableStream to start pulling new data on that read() call as the consumer code is being overwhelmed by that big data chunk and it shouldn't be immediately considered to have been consumed.

In the W3C spec, I tried to address this by amountBeingReturned. Could you please investigate this issue?

@domenic
Copy link
Member

domenic commented Feb 10, 2014

Hmm. At first I was really concerned after thinking about this. But after thinking longer, I am not so sure it is a problem. Let me know what you think of this reasoning.

A readable stream is not concerned with the consumer. It maintains an internal buffer, with a high-water mark, by itself. It uses this internal buffer and high-water mark to determine when to apply backpressure to the underlying source.

So, it doesn't matter what the consumer code thinks of how big the chunks are. The only thing that matters is whether the readable stream's buffer is too full, or not. If the consumer code gets a big data chunk, then it chooses not to call read() again for a while. But then the readable stream decides what implications this has for applying backpressure.

More concretely. Consider a fast readable stream with large chunks being consumed slowly. Let's say that the readable stream's buffer is already full, so backpressure is being applied to the underlying source. The consumer calls read(). Then:

  1. If the consumption of the chunk drained the entire buffer, then it is the readable stream's job to anticipate future requests, and turn off backpressure in order to gather data up to the high water mark in preparation for future reads---whenever the consumer is ready.
  2. If the buffer is still draining, it will continue to apply backpressure, letting the buffer drain before turning the backpressure off.
  3. If the buffer is still at or above the high-water mark, it will of course continue to apply backpressure.

You seem to be worried about case 1. But I argue that it is not a problem; it is still good to anticipate future data requests. If anything, it is is simply a misconfigured readable stream: it has a high-water mark that is too low for the possible data chunk sizes coming through. You should never really be able to drain the entire buffer, from above the high water mark to empty, with one read().

What do you think?

@tyoshino
Copy link
Member Author

A readable stream is not concerned with the consumer.

Yeah. It's a question what a certain role should be done by the stream or needs involvement by the consumer (and information only the consumer has).

You seem to be worried about case 1

Yes, I'm concerned with cases like 1. But there was a misunderstanding at my side (#75). This won't be an issue if there's no low-water mark.

So, strategy is intended to be configured by the consumer? I'd like to start thinking of a concrete usage. Suppose that we're redesigning XHR to return a ReadableStream. How we pass the initial value of high water mark and what do we do when the consumption speed changes? Maybe we wrap strategy behind some subclass and show it to the consumer code.

@domenic
Copy link
Member

domenic commented Mar 10, 2014

Sorry for the delay.

So, strategy is intended to be configured by the consumer?

Not quite; the strategy is intended to be configured by the stream creator. In the case of XHR, that would be the browser, which would probably set either an appropriate HWM for network operations generally, or perhaps one that depends on current network conditions. (But, see #13 (comment) for how nobody ever changes Node's defaults.)

@tyoshino
Copy link
Member Author

To get the best TCP performance over high latency network, we need to allocate large buffer sufficient to have receive window equal to bandwidth delay product. Similar situation may happen for other sorts of producers.

Suppose we wrap a TCP socket with a ReadableStream. Since pull doesn't pass along how much data the RS can buffer (HWM), the socket (BSD socket itself doesn't, but if we have better API in the future which allows us to adjust receive window dynamically) cannot adjust amount to pull from the server (receive window). needsMore only signals back pressure via return value of push. So, all we can do is setting HWM to BDP. But for slow consumer, it's undesirable to have so big window. It just leads to unexpected inflation of buffer. While, for fast consumer, small HWM is just frustrating since BDP won't be filled.

So, I think

  • we should be passing along kind of information say, space available (HWM - filled amount) or speed in a RS on pull. It doesn't need to be limiting push, just work as a hint is OK.
  • and also there should be some interface for a consumer to tell its consuming speed to the RS (or strategy object?) in addition to the signal of read() calling speed.

Sorry if this has already been well discussed.

@domenic
Copy link
Member

domenic commented Jun 17, 2014

Consolidated into #119.

@domenic domenic closed this as completed Jun 17, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants