Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The error handler isn't removed on successful stream async iteration #32995

Closed
szmarczak opened this issue Apr 22, 2020 · 44 comments
Closed

The error handler isn't removed on successful stream async iteration #32995

szmarczak opened this issue Apr 22, 2020 · 44 comments
Labels
stream Issues and PRs related to the stream subsystem.

Comments

@szmarczak
Copy link
Member

  • Version: v13.13.0
  • Platform: Linux solus 5.5.11-151.current deps: update openssl to 1.0.1j #1 SMP PREEMPT Tue Mar 24 18:06:46 UTC 2020 x86_64 GNU/Linux
  • Subsystem: stream

What steps will reproduce the bug?

const {Readable} = require('stream');

async function getBuffer(readable) {
	const chunks = [];

	for await (const chunk of readable) {
		chunks.push(chunk);
	}

	return Buffer.concat(chunks);
}

const stream = new Readable({
    read() {
        this.push('chunk');
        this.push(null);
    }
});

const buffer = await getBuffer(stream);

console.log(stream.listenerCount('error'));

How often does it reproduce? Is there a required condition?

Always.

What is the expected behavior?

0

What do you see instead?

1

Additional information

The error handler seems to be from https://github.com/nodejs/node/blob/master/lib/internal/streams/end-of-stream.js

@himself65 himself65 added the stream Issues and PRs related to the stream subsystem. label Apr 22, 2020
@himself65
Copy link
Member

himself65 commented Apr 22, 2020

I realize that function eof callback didn't call up

diff --git a/lib/internal/streams/end-of-stream.js b/lib/internal/streams/end-of-stream.js
index 4742391fd7..f6cac90dc0 100644
--- a/lib/internal/streams/end-of-stream.js
+++ b/lib/internal/streams/end-of-stream.js
@@ -8,6 +8,7 @@ const {
   ERR_STREAM_PREMATURE_CLOSE
 } = require('internal/errors').codes;
 const { once } = require('internal/util');
+const debug = require('internal/util/debuglog').debuglog('stream');

 function isRequest(stream) {
   return stream.setHeader && typeof stream.abort === 'function';
@@ -60,6 +61,7 @@ function eos(stream, opts, callback) {
     (opts.readable !== false && isReadable(stream));
   const writable = opts.writable ||
     (opts.writable !== false && isWritable(stream));
+  debug('eos', readable, writable);

   const wState = stream._writableState;
   const rState = stream._readableState;
@@ -152,6 +154,7 @@ function eos(stream, opts, callback) {
   }

   return function() {
+    debug('eos cleanup');
     callback = nop;
     stream.removeListener('aborted', onclose);
     stream.removeListener('complete', onfinish);
\node\Release\node.exe C:\Users\Himself65\Desktop\github\test\1.js
STREAM 21764: eos true false
STREAM 21764: on readable 0 false
STREAM 21764: read undefined
STREAM 21764: need readable true
STREAM 21764: length less than watermark true
STREAM 21764: do read
STREAM 21764: readableAddChunk chunk
STREAM 21764: emitReadable true false
STREAM 21764: emitReadable false
STREAM 21764: readableAddChunk null
STREAM 21764: onEofChunk
STREAM 21764: emitReadable false true
STREAM 21764: endReadable false
STREAM 21764: readable nexttick read 0
STREAM 21764: read 0
STREAM 21764: endReadable false
STREAM 21764: emitReadable_ false 0 true
STREAM 21764: flow false
STREAM 21764: endReadableNT false 0
STREAM 21764: endReadableNT true 0
1

Process finished with exit code 0

I didn't work on this part before so I don't know if it's a bug or a feature

and this appears on the versions which more than v10

cc @nodejs/streams

@ronag
Copy link
Member

ronag commented Apr 22, 2020

I think this is by design. Anything that uses finished or pipeline will have a dangling handlers, e.g. error. This is stated in the documentation. Though maybe not for async iteration.

@himself65
Copy link
Member

Yes, it's by design. Closing

Refs: https://nodejs.org/dist/latest-v14.x/docs/api/stream.html#stream_stream_finished_stream_options_callback

@ronag
Copy link
Member

ronag commented Apr 22, 2020

@himself65 It might be worth to add a corresponding section under async iterator:

stream.finished() leaves dangling event listeners (in particular 'error', 'end', 'finish' and 'close') after callback has been invoked. The reason for this is so that unexpected 'error' events (due to incorrect stream implementations) do not cause unexpected crashes. If this is unwanted behavior then the returned cleanup function needs to be invoked in the callback:

@himself65
Copy link
Member

yes, that's better. so I reopen this until PR fixes

@himself65 himself65 reopened this Apr 22, 2020
@ronag ronag added doc Issues and PRs related to the documentations. good first issue Issues that are suitable for first-time contributors. labels Apr 22, 2020
@szmarczak
Copy link
Member Author

If this is unwanted behavior then the returned cleanup function needs to be invoked in the callback

It should be possible to do so in all cases, including async iterators.

@szmarczak
Copy link
Member Author

due to incorrect stream implementations

What incorrect stream implementations?

@szmarczak
Copy link
Member Author

Can you link to a PR that introduced this change please?

@ronag
Copy link
Member

ronag commented Apr 22, 2020

It should be possible to do so in all cases, including async iterators.

I don't see how it would be possible with async iterators.

@ronag
Copy link
Member

ronag commented Apr 22, 2020

What incorrect stream implementations?

Those that emit 'error' after completion.

@szmarczak
Copy link
Member Author

I don't see how it would be possible with async iterators.

E.g. expose it via stream.cleanup()

@szmarczak
Copy link
Member Author

E.g. expose it via stream.cleanup()

Why not?

@ronag
Copy link
Member

ronag commented Apr 22, 2020

Can you link to a PR that introduced this change please?

As far as I can tell, it's been like that from the beginning in terms of async iterators.

@szmarczak
Copy link
Member Author

szmarczak commented Apr 22, 2020

Those that emit 'error' after completion.

Then it should throw an unhandled uncaught exception like for all event emitters, right?

As far as I can tell, it's been like that from the beginning in terms of async iterators.

Ok, I'll check this.

@ronag
Copy link
Member

ronag commented Apr 22, 2020

Why not?

I don't see the value and it's logically weird.

@szmarczak
Copy link
Member Author

I don't see the value and it's logically weird.

See:

Then it should throw an unhandled uncaught exception like for all event emitters, right?

@szmarczak
Copy link
Member Author

It's just inconsistent behavior.

@ronag
Copy link
Member

ronag commented Apr 22, 2020

Then it should throw an unhandled exception like for all event emitters, right?

That is something we want to avoid. The point of pipeline and finished is to normalize implementations and avoid this due to "incorrect" implementations.

@ronag
Copy link
Member

ronag commented Apr 22, 2020

It's just inconsistent behavior.

In what way?

@ronag
Copy link
Member

ronag commented Apr 22, 2020

@mcollina Might have more information regarding the motivation for this behavior.

@szmarczak
Copy link
Member Author

In what way?

That streams inherit from event emitter.

The point of pipeline and finished is to normalize implementations and avoid this due to "incorrect" implementations.

The users should be aware of this problem and fix their implementations so they are correct. Still, I see no reason why this incorrect implementations are allowed.

@ronag
Copy link
Member

ronag commented Apr 22, 2020

That streams inherit from event emitter.

But we are talking about async iterator now?

The users should be aware of this problem and fix their implementations so they are correct. Still, I see no reason why this incorrect implementations are allowed.

Because we don't want to break the ecosystem?

@szmarczak
Copy link
Member Author

szmarczak commented Apr 22, 2020

But we are talking about async iterator now?

👍

Because we don't want to break the ecosystem?

Hmm... That's the idea of breaking changes aka major changes.

Anyway, if it needs to be the way it is, I'm fine with that. I understand your position.

@ronag
Copy link
Member

ronag commented Apr 22, 2020

@himself65 This was documented in #28997

@ronag
Copy link
Member

ronag commented Apr 22, 2020

Hmm... That's the idea of breaking changes aka major changes.

I agree with you. There is however is large range of opinions on this topic in regards to breaking vs value.

@szmarczak
Copy link
Member Author

szmarczak commented Apr 22, 2020

I agree with you. There is however is large range of opinions on this topic in regards to breaking vs value.

LTS versions mean are more valuable IMO

@sindresorhus
Copy link

I also think this should be properly fixed.

And the https://github.com/nodejs/node/pull/28997/files docs does not make it clear enough the consequences of the permanent error handler and what the user is supposed to do about it.

@ronag
Copy link
Member

ronag commented Apr 22, 2020

I don't understand why this is a problem? The stream is destroyed and completed.
I don't see how it makes a difference whether it has dangling event listeners or not?

@novemberborn
Copy link

It's surprising in that, when you use an async iterator, you never add an error listener. So you can't clean it up either. The documentation for finished() assumes you can remove the listener yourself.

@ronag
Copy link
Member

ronag commented Apr 22, 2020

It's surprising in that, when you use an async iterator, you never add an error listener. So you can't clean it up either.

I guess, basically once you have converted a stream into an async iterator it should not really be used as a stream anymore. Unfortunately javascript does not have move semantics.

It's unfortunate and I also would like to fix it but I don't see any clean way of resolving this. Also I don't really see how it is a big problem?

The documentation for finished() assumes you can remove the listener yourself.

But we are talking about async iterator now?

for await (const chunk of stream) {
}

Does not leave much of an API surface.

@szmarczak
Copy link
Member Author

The stream is destroyed and completed.
I don't see how it makes a difference whether it has dangling event listeners or not?

You can reuse that stream in terms of error handling. Got does this. In my http-timer package I listen for the error event so it gives the expected result. If it hadn't been done that way, then it would require a complicated mechanism to pass the error from one place to another. That's just one use case. But there's a workaround - some sort of an internal end event, so the algorithm knows when the response has been read and delays this.push(null) so the error event is emitted before the end one.

@novemberborn's example is perfectly valid.

@szmarczak szmarczak reopened this Apr 22, 2020
@ronag
Copy link
Member

ronag commented Apr 22, 2020

You can reuse that stream in terms of error handling

How? Once the for await loop completes or exits it should not be emitting any further errors.

Sorry, I do not follow the rest of your example.

@novemberborn's example is perfectly valid.

Sorry, I do not follow.

@ronag ronag removed good first issue Issues that are suitable for first-time contributors. doc Issues and PRs related to the documentations. labels Apr 22, 2020
@ronag
Copy link
Member

ronag commented Apr 22, 2020

E.g. expose it via stream.cleanup()

How would this be different from:

stream.removeAllListeners('error')

@szmarczak
Copy link
Member Author

szmarczak commented Apr 22, 2020

But we are talking about async iterator now?
Sorry, I do not follow.

But async iterator utilizes streams, that's the thing. They mean almost 99% the same. You understand this already.

I guess, basically once you have converted a stream into an async iterator it should not really be used as a stream anymore.

So there shouldn't be a hanging error listener.

How? Once the for await loop completes or exits it should not be emitting any further errors.

Let me explain so you understand this correctly: the async iterator leaves a hanging error listener. Once the response has been read via the async iterator, Got cannot reuse the error event anymore to throw a HTTPError when it receives 404 for example. This breaks my http-timer package, which silently listens for the error event. It would be required to expose another function to pass the error to the http-timer instance. It can be avoided using this:

some sort of an internal end event, so the algorithm knows when the response has been read and delays this.push(null) so the error event is emitted before the end one.

How would this be different from:

You cannot guarantee that someone didn't attach their own listener.

@ronag
Copy link
Member

ronag commented Apr 22, 2020

Got cannot reuse the error event anymore to throw a HTTPError when it receives 404 for example.

Re-use the error event? Are you doing stream.emit('error', err)? That's not really supported by streams.

Once the response has been read via the async iterator

Once the read of the async iterator is completed the stream is destroy():d and awaited and should not be usable for anything else.

It sounds to me like you are trying to use streams in a way it was not really intended.

You cannot guarantee that someone didn't attach their own listener.

Yes, so how would stream.cleanup() be implemented?

@ronag
Copy link
Member

ronag commented Apr 22, 2020

In my http-timer package I

You are overriding emit here. That's really a bit outside of supported usage. You are a bit on your own risk here.

@ronag
Copy link
Member

ronag commented Apr 22, 2020

I'm not really following along here. If you prefer maybe a google hangout call or something would be more constructive? Would help if I understand your use case better.

@szmarczak
Copy link
Member Author

Are you doing stream.emit('error', err)?

Exactly.

It sounds to me like you are trying to use streams in a way it was not really intended.

Otherwise it would be necessary to delay the end event until the algorithm made sure no error will occur. This would also delay the callback for gotPromise.once('downloadProgress', ...), which could be a bit misleading.

You are overriding emit here. That's really a bit outside of supported usage. You are a bit on your own risk here.

emitter[Symbol.for('nodejs.rejection')](err, eventName[, ...args]) is experimental. I don't see any other solution as of now.

Would help if I understand your use case better.

I know, sorry I haven't set up a more construcive example yet. Will do.

@ronag
Copy link
Member

ronag commented Apr 22, 2020

Exactly.

Yea, that kind of breaks some stream assumptions.

I know, sorry I haven't set up a more construcive example yet. Will do.

Cool. I'm more than happy to try and help out but digging into the details of got is a little outside my timeframe.

Otherwise it would be necessary to delay the end event

Could you wait for close instead?

@ronag
Copy link
Member

ronag commented Apr 22, 2020

@szmarczak Regarding http-timer. You probably want to listen to 'aborted' on the response object.

Also, have you looked at EventEmitter.errorMonitor?

@szmarczak
Copy link
Member Author

Nice find, thanks :) I guess I look at the docs not so often, this definitely solves the origin.emit = ... thing.

I'm sketching some code for the original problem rn.

Yea, that kind of breaks some stream assumptions.

I think it would be more useful to throw on every action if the stream is destroyed to avoid ambiguity.

Cool. I'm more than happy to try and help out but digging into the details of got is a little outside my timeframe.

Thanks for taking your time. I really appreciate it!

Could you wait for close instead?

That would work too, definitely.

@mcollina
Copy link
Member

I would recommend closing this. The dangling 'error' listener is a design choice to prevent a "non compliant" stream implementation to crash the process without the user adding one.

I'm closing.

@szmarczak
Copy link
Member Author

@ronag I made an example of 237 lines (sorry that's the shortest example I came up with): https://gist.github.com/szmarczak/e7eb659bebb33bceb7577e90d7216aa3

The line I care the most is line no. 198.

Yes, so how would stream.cleanup() be implemented?

Call stream.removeListener(name, function) on these:

stream.finished() leaves dangling event listeners (in particular 'error', 'end', 'finish' and 'close') after callback has been invoked.

is a design choice

Have you actually encountered a package that benefits from this? Or was this just "it's possible, so let's do this"? Nevermind, let's forget this question. It doesn't change anything. But I'm strongly against the latter.

Sorry if my behavior was a bit harsh, I was 10000% convinced this was a bug and I didn't expect how it turned out to be.

@mcollina
Copy link
Member

Have you actually encountered a package that benefits from this? Or was this just "it's possible, so let's do this"? Nevermind, let's forget this question. It doesn't change anything. But I'm strongly against the latter.

A significant amount of "old/legacy" stream packages could emit multiple 'error' events. finished is implemented to be safe, exactly for this reason. To add some context, finished is an evolution of http://npm.im/end-of-stream, which is downloaded 15 million times per week. I would say that most of the NPM ecosystem benefit from this behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stream Issues and PRs related to the stream subsystem.
Projects
None yet
Development

No branches or pull requests

6 participants