You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
As of 0.17.0, there is an increased number of error events emitted from the ListWatch, which results in, if you follow the example code, in unnecessary restarts of the informer.
Client Version 0.17.1
Server Version 1.22.x
To Reproduce
Keep the informer open for long enough and observe the behavior:
In 0.15.0, you will notice a lot of reconnections with calls to /list to repopulate the cache
The above PR has an issue, because it [likely?] resulted in new ERROR events getting emitted via the watch feed - they were not being handled. fix(cache): watch errors must call done handler #781 addressed this, by calling doneHandler() with an error.
The fix in fix(cache): watch errors must call done handler #781 introduced additional calls to doneHandler(), and by virtue of that - to the _stop() method which calls this.request.abort();. Calling request.abort() occasionally results in an ECONNRESET (depending on who closes the socket first - the client or the server) emitted as request.on('error'). In the case of handling the ERROR event by aborting the existing request, this started happening a lot more often in 0.17.0.
In <=0.15.x, this was likely being hidden by the fact that watch.ts ensures that done() is only called once - i.e. there's likely the stream close or some such event arriving before the request.on('error') is emitted with the ECONNRESET.
Proposed fix options
I haven't yet made up my mind on what is the best approach for the fix :) Internally, we'll probably monkeypatch the _stop() method with the first option until a proper fix is rolled out.
Option 1. Disable the error handler in the ListWatch._stop(), something along the lines of:
I don't like this, as it is using removeAllListeners, as it does not have access or knowledge of the doneHandler to only remove the specific handler that Watch.watch() creates. It's fine for my internal purposes, where we're sure we don't have any other error handlers, but might be less so for generic library code?
That said, it's definitely the easiest one and might just be the most practical one, depending on when the 1.x / switch to fetch() is coming?
Option 2. Wrap the req.abort() method in DefaultRequest to remove the error handlers. I don't particularly like this, as it still has the same issue with removeAllListeners not knowing about which specific handler to remove, let alone monkeypatching the abort().
Option 3. Wrap the req.abort() method in Watch.watch() to call off('error', doneCallOnce) for both - the request and the stream. Same as above, but avoids removeAllListeners in favor of an off for a specific callback.
Another downside of all of the three of the above is that they are effectively request specific - not sure if it will be relevant for the fetch() based approach - it is also technically a breaking change, since the RequestResult will require a new method to be exported (removeAllListeners) for people who provide their own requestImpl.
Option 4. Wrap the req.abort() and set an aborted variable inside the closure, and then ignore any request errors that come after it is set (we probably still need to handle stream closure as doneCallOnce(null)?). Same as above - I don't like monkeypatching the abort(), but at least there is no breaking change in the RequestResult.
Option 5. Refactor Watch.watch() in such a way that we can do a Watch.abort() or something like that, so that at least we avoid monkeypatching request.abort(). This is probably the least disruptive option, but it feels like it's a bit more work than the others...
Are there any better ideas for implementation? Happy to take a stab at a PR, if there's an agreement on the approach.
For longer term, I would think that maybe the Watch interface needs a re-haul anyways - it takes two callbacks and returns a promise - that alone feels like it should be an event emitter of its own?
The text was updated successfully, but these errors were encountered:
@dominykas thanks for the detailed exploration and options!
Given that this.request is private and is obtained via a call to Watch.watch(...) which doesn't expose any way to add additional callbacks to the request, I think that your first option is ok to implement.
Describe the bug
As of
0.17.0
, there is an increased number oferror
events emitted from theListWatch
, which results in, if you follow the example code, in unnecessary restarts of the informer.Client Version
0.17.1
Server Version
1.22.x
To Reproduce
0.15.0
, you will notice a lot of reconnections with calls to/list
to repopulate the cache0.16.0
, you will notice "freezes" (likely Informer example requires fix: infomer restarted after onerror event doesn't dispatch other events #795)0.17.0
, you will notice a lot oferror
events emitted withECONNRESET
, happening afterreq.abort()
is called, i.e. they're no longer relevant.Expected behavior
listFn()
when neccessaryExample Code
Assuming it's reproducible with
cache-example.js
.Environment (please complete the following information):
Likely not relevant.
Additional context
Here's my attempt to reconstruct what happened:
0.16.0
. It included caching improvements (🎉) which avoid callinglistFn()
when unnecessary.ERROR
events getting emitted via the watch feed - they were not being handled. fix(cache): watch errors must call done handler #781 addressed this, by callingdoneHandler()
with an error.0.16.3
and we've experienced similar problems when we've tried to upgrade to 0.16. The fix was included in0.17.0
doneHandler()
, and by virtue of that - to the_stop()
method which callsthis.request.abort();
. Callingrequest.abort()
occasionally results in anECONNRESET
(depending on who closes the socket first - the client or the server) emitted asrequest.on('error')
. In the case of handling theERROR
event by aborting the existing request, this started happening a lot more often in0.17.0
.<=0.15.x
, this was likely being hidden by the fact thatwatch.ts
ensures thatdone()
is only called once - i.e. there's likely the streamclose
or some such event arriving before therequest.on('error')
is emitted with theECONNRESET
.Proposed fix options
I haven't yet made up my mind on what is the best approach for the fix :) Internally, we'll probably monkeypatch the
_stop()
method with the first option until a proper fix is rolled out.error
handler in theListWatch._stop()
, something along the lines of:I don't like this, as it is using
removeAllListeners
, as it does not have access or knowledge of thedoneHandler
to only remove the specific handler thatWatch.watch()
creates. It's fine for my internal purposes, where we're sure we don't have any othererror
handlers, but might be less so for generic library code?That said, it's definitely the easiest one and might just be the most practical one, depending on when the
1.x
/ switch tofetch()
is coming?req.abort()
method inDefaultRequest
to remove the error handlers. I don't particularly like this, as it still has the same issue withremoveAllListeners
not knowing about which specific handler to remove, let alone monkeypatching theabort()
.req.abort()
method inWatch.watch()
to calloff('error', doneCallOnce)
for both - the request and the stream. Same as above, but avoidsremoveAllListeners
in favor of anoff
for a specific callback.Another downside of all of the three of the above is that they are effectively
request
specific - not sure if it will be relevant for thefetch()
based approach - it is also technically a breaking change, since theRequestResult
will require a new method to be exported (removeAllListeners
) for people who provide their ownrequestImpl
.req.abort()
and set anaborted
variable inside the closure, and then ignore any request errors that come after it is set (we probably still need to handle stream closure asdoneCallOnce(null)
?). Same as above - I don't like monkeypatching theabort()
, but at least there is no breaking change in theRequestResult
.Watch.watch()
in such a way that we can do aWatch.abort()
or something like that, so that at least we avoid monkeypatchingrequest.abort()
. This is probably the least disruptive option, but it feels like it's a bit more work than the others...Are there any better ideas for implementation? Happy to take a stab at a PR, if there's an agreement on the approach.
For longer term, I would think that maybe the
Watch
interface needs a re-haul anyways - it takes two callbacks and returns a promise - that alone feels like it should be an event emitter of its own?The text was updated successfully, but these errors were encountered: