Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary error events in ListWatch (cache/informer) #881

Closed
dominykas opened this issue Sep 19, 2022 · 1 comment · Fixed by #884
Closed

Unnecessary error events in ListWatch (cache/informer) #881

dominykas opened this issue Sep 19, 2022 · 1 comment · Fixed by #884

Comments

@dominykas
Copy link
Contributor

dominykas commented Sep 19, 2022

Describe the bug
As of 0.17.0, there is an increased number of error events emitted from the ListWatch, which results in, if you follow the example code, in unnecessary restarts of the informer.

Client Version
0.17.1

Server Version
1.22.x

To Reproduce

Expected behavior

  • The informer should quietly reconnect to the API, restart watches when necessary, repopulate cache by calling listFn() when neccessary

Example Code

Assuming it's reproducible with cache-example.js.

Environment (please complete the following information):

Likely not relevant.

Additional context

Here's my attempt to reconstruct what happened:

  1. Attempt to start list watch from last resourceVersion #711 was merged and released as part of 0.16.0. It included caching improvements (🎉) which avoid calling listFn() when unnecessary.
  2. The above PR has an issue, because it [likely?] resulted in new ERROR events getting emitted via the watch feed - they were not being handled. fix(cache): watch errors must call done handler #781 addressed this, by calling doneHandler() with an error.
  3. The fix in fix(cache): watch errors must call done handler #781 introduced additional calls to doneHandler(), and by virtue of that - to the _stop() method which calls this.request.abort();. Calling request.abort() occasionally results in an ECONNRESET (depending on who closes the socket first - the client or the server) emitted as request.on('error'). In the case of handling the ERROR event by aborting the existing request, this started happening a lot more often in 0.17.0.
    • In <=0.15.x, this was likely being hidden by the fact that watch.ts ensures that done() is only called once - i.e. there's likely the stream close or some such event arriving before the request.on('error') is emitted with the ECONNRESET.

Proposed fix options

I haven't yet made up my mind on what is the best approach for the fix :) Internally, we'll probably monkeypatch the _stop() method with the first option until a proper fix is rolled out.

  • Option 1. Disable the error handler in the ListWatch._stop(), something along the lines of:
+           this.request.removeAllListeners('error');
+           this.request.on('error', () => { /* void */ })
            this.request.abort();
            this.request = undefined;

I don't like this, as it is using removeAllListeners, as it does not have access or knowledge of the doneHandler to only remove the specific handler that Watch.watch() creates. It's fine for my internal purposes, where we're sure we don't have any other error handlers, but might be less so for generic library code?

That said, it's definitely the easiest one and might just be the most practical one, depending on when the 1.x / switch to fetch() is coming?

  • Option 2. Wrap the req.abort() method in DefaultRequest to remove the error handlers. I don't particularly like this, as it still has the same issue with removeAllListeners not knowing about which specific handler to remove, let alone monkeypatching the abort().
  • Option 3. Wrap the req.abort() method in Watch.watch() to call off('error', doneCallOnce) for both - the request and the stream. Same as above, but avoids removeAllListeners in favor of an off for a specific callback.

Another downside of all of the three of the above is that they are effectively request specific - not sure if it will be relevant for the fetch() based approach - it is also technically a breaking change, since the RequestResult will require a new method to be exported (removeAllListeners) for people who provide their own requestImpl.

  • Option 4. Wrap the req.abort() and set an aborted variable inside the closure, and then ignore any request errors that come after it is set (we probably still need to handle stream closure as doneCallOnce(null)?). Same as above - I don't like monkeypatching the abort(), but at least there is no breaking change in the RequestResult.
  • Option 5. Refactor Watch.watch() in such a way that we can do a Watch.abort() or something like that, so that at least we avoid monkeypatching request.abort(). This is probably the least disruptive option, but it feels like it's a bit more work than the others...

Are there any better ideas for implementation? Happy to take a stab at a PR, if there's an agreement on the approach.

For longer term, I would think that maybe the Watch interface needs a re-haul anyways - it takes two callbacks and returns a promise - that alone feels like it should be an event emitter of its own?

@brendandburns
Copy link
Contributor

@dominykas thanks for the detailed exploration and options!

Given that this.request is private and is obtained via a call to Watch.watch(...) which doesn't expose any way to add additional callbacks to the request, I think that your first option is ok to implement.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants