Empty catch block in watch stream event handling #593

dominykas · 2021-02-24T10:57:27Z

While trying to understand some unrelated issues I spotted the following:

Lines 121 to 123 in 5988253

    
           } catch (ignore) { 
        
               // ignore parse errors 
        
           }

It seems it got introduced in https://github.com/kubernetes-client/javascript/pull/289/files - but the reasoning is unclear to me. Can someone comment on this please?

I might be wrong of course, but it feels to me that if there can be unparseable messages, then they need to be handled? And if there can't be any then the try/catch might not be necessary, i.e. it is likely better if the app crashes and logs the details for people consuming this, rather than silently failing?

More importantly, this will also silence the user errors inside the callback, so if someone has a bug in their implementation - they would never see it?

The text was updated successfully, but these errors were encountered:

dominykas · 2021-02-24T12:21:03Z

OK, I can see there's an explicit test for that:

javascript/src/watch_test.ts

Line 320 in 5988253

it('should ignore JSON parse errors', async () => {

Introduced as part of #250.

I'm still not sure whether that's the correct behavior? If the HTTP connection got terminated, and sent an incomplete JSON - it should fail? The original commit also says this:

An alternative approach to this commit would invoke the done callback with parse errors, but that could result in multiple invocations of the done callback.

It seems the done callback now does have a guard around multiple callbacks? So possibly that's a better approach?

brendandburns · 2021-02-25T05:30:53Z

If the network fails half-way through sending a response, is that really an error? My answer is 'no'

The watch will get closed for other reasons (e.g. the network stream stopping) and I don't think we want to issue multiple done callbacks in that case.

Is this causing problems?

dominykas · 2021-02-25T10:08:18Z

Is this causing problems?

It's a silent catch, so truth be told I don't know whether it is or isn't causing issues. I was looking into some problems which could be ascribed to #559, but it may as well be this.

And more importantly - it also wraps around the callback() so it would silence any errors inside the consumer code - I think the callback() should definitely move out of the try block?

If the network fails half-way through sending a response, is that really an error? My answer is 'no'

Just so I understand this properly... what you're saying is that without this silent catch, it would call done with an error when it fails to parse, as it could (will?) happen before the connection actually closing, meaning that the user would see "failed to parse" instead of "network disconnected"?

brendandburns · 2021-03-02T20:31:34Z

Regarding wrapping the user callback, that is a fair request. I guess it's arguable whether a user callback failing should kill the watch or not. My feeling is that a user should catch all their own errors (e.g. callback() should never throw). This is roughly akin to the way that Java works where if you throw in the middle of a thread Java just eats the exception. Taking out the watch because of an exception is a pretty extreme thing to do imho.

I suppose that we could always log any exceptions that come out of the callback, but keep running. That seems reasonable, would that work for you?

And yes, the sequence goes like this: half a JSON object is streamed over TCP, TCP connection is severed, JSON object is attempted to be parsed, but since it's only half an object, parsing fails, because TCP is severed, network done is issued.

dominykas · 2021-03-05T15:39:49Z

My feeling is that a user should catch all their own errors (e.g. callback() should never throw).

Is "never throw" a reasonable expectation to ask of a user of a library? It's very hard to achieve in JS land, simply because it's very hard to implement a catch() that can't possibly throw itself short of keeping it empty (a no no) or directly process.exit()ing.

There's also the question of responsibilities - if the userland code inside callback throws, then is it up to @kubernetes/client-node to decide what to do with that?

I think overall the advice to "crash on exceptions" has really stood the test of time in Node land - you can't trust the state in your application after you had an error, so it's best to restart from scratch, so an uncaughtException and a process dying is not an unexpected outcome for someone writing code in Node?

One other way to address this would be to call the callback inside a setTimeout(), but that might break some expectations around sync/async behavior so that's probably not very desirable either.

In the end, I'd argue that the callback() should be called outside of the try/catch block. If it's just a log entry (or even an error event somewhere), then you won't see your app is broken unless you're looking.

And yes, the sequence goes like this: half a JSON object is streamed over TCP, TCP connection is severed, JSON object is attempted to be parsed, but since it's only half an object, parsing fails, because TCP is severed, network done is issued.

OK, thanks for clarifying this. I'll try to think about this and I'll try to come back with some suggestions on different ways to achieve this goal if that's ok?

The things I'd like to check / ideas to explore (note to self, mostly):

Is this how Node streams work? i.e. if there's an incomplete packet, will it even emit a data event?
Is this how byline works? i.e. Node streams do not guarantee package boundaries, iirc, so you could receive a data event without a new line - will it buffer it until the new line? In this case, it's impossible that it would emit the data event with an incomplete JSON (as long as the k8s API does not send an incomplete JSON with a new line inside).
I saw there was some history with json-stream - maybe there's alternative modules that can handle it better? If my memory is right about data not being guaranteed, it could also be that a large chunk of JSON arrives via two data packets, so buffering and not emitting until a full packet is received is probably a better approach in terms of how the stream itself should work? I vaguely recall there's modules that can do it...
Would changing the callback to an errback (i.e. standard callback with an err first) make sense? If the JSON.parse fails - you callback(parseErr) and then whatever is using the watcher can decide what to do? e.g. the informer could decide to reconnect instead of reporting that upstream?

Sorry for the wall of text / being this particular about it - just looking for the best approach here to avoid surprises long term.

brendandburns · 2021-03-10T19:55:46Z

I'm definitely open to proposals/ways to make this obey the principle of least surprise.

brendandburns · 2021-03-10T19:57:24Z

"Crash on exceptions" really assumes that your application is single threaded (and yes, I know that node apps are single threaded, but in practice callbacks and promises create the illusions of multiple different threads)

It's totally unclear to me that you want a watch crashing (especially if it's a background thread) to take down your entire application.

brendandburns · 2021-03-10T19:58:09Z

But I agree that making it explicit for the user, and not forcing the decision on them is a good thing. So perhaps just a parameter ("abort on exception"?)

dominykas · 2021-03-17T12:25:20Z

Just to make sure we're discussing the same thing - I'm explicitly talking about calling the callback() inside the try/catch (I still need to follow up the streams issue in more detail).

"Crash on exceptions" really assumes that your application is single threaded (and yes, I know that node apps are single threaded, but in practice callbacks and promises create the illusions of multiple different threads)

It's totally unclear to me that you want a watch crashing (especially if it's a background thread) to take down your entire application.

I'm not sure I agree with this. I can see how this might make sense in Java or .NET, but that's just not how node applications are (or even should be) written canonically?

Libraries, including built-ins, should not attempt to try to handle the user code. That is strictly outside their boundaries? Take this for example:

> require('fs').readFile('./package.json', (err, res) => { throw new Error('here'); })
undefined
> Uncaught Error: here

readFile just can't possibly know what's inside the callback and just rethrows the exception. It's an unfortunate fact that in Javascript there is no clean way to handle such exceptions (aside from process.on('uncaughtException') in Node), but that's just how it works, and I would like to believe that the ecosystem has adapted accordingly years ago? Browsers do the exact same thing with DOM events, etc

So perhaps just a parameter ("abort on exception"?)

I don't think that it's worth it in the context of the callbacks (although worth pondering over as an option for the JSON.parse case).

The default assumed behavior in Node.js is that if user's callback throws, then libraries don't handle that. The user always has an option to explicitly wrap their own callbacks in a try/catch (although I'd personally very strongly advise against that, unless you know what you're doing).

I spent the last week rewriting a large chunk of our code to replace the GoDaddy's kubernetes-client (which no longer seems to be maintained) with this. I'm doing unit tests against kind and informers were a pleasure to work with, except for this try/catch... I stopped even counting the times when my code was broken (i.e. the code inside the callback) and therefore throwing and the tests were just hanging without any output...

dominykas · 2021-03-17T16:09:05Z

One more example from the Node.js world: if you have a stream (any stream from the core), and it emits an error event, and you do not have an error handler - it will crash the process. It's a developer error. I'd argue that the same should apply to Informer - should probably discuss that in a separate issue though.

But generally, silent failures are the worst.

fejta-bot · 2021-06-15T16:22:18Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

dominykas · 2021-06-16T07:33:03Z

/remove-lifecycle stale

dominykas · 2021-06-16T07:33:17Z

Can I open a PR with some suggestions to address this? We can continue the discussions there?

k8s-triage-robot · 2021-09-14T08:15:45Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2021-10-14T09:06:28Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2021-11-13T09:31:14Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2021-11-13T09:31:15Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 15, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 16, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 14, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 14, 2021

k8s-ci-robot closed this as completed Nov 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empty catch block in watch stream event handling #593

Empty catch block in watch stream event handling #593

dominykas commented Feb 24, 2021 •

edited

Loading

dominykas commented Feb 24, 2021

brendandburns commented Feb 25, 2021

dominykas commented Feb 25, 2021

brendandburns commented Mar 2, 2021 •

edited

Loading

dominykas commented Mar 5, 2021

brendandburns commented Mar 10, 2021

brendandburns commented Mar 10, 2021

brendandburns commented Mar 10, 2021

dominykas commented Mar 17, 2021 •

edited

Loading

dominykas commented Mar 17, 2021 •

edited

Loading

fejta-bot commented Jun 15, 2021

dominykas commented Jun 16, 2021

dominykas commented Jun 16, 2021

k8s-triage-robot commented Sep 14, 2021

k8s-triage-robot commented Oct 14, 2021

k8s-triage-robot commented Nov 13, 2021

k8s-ci-robot commented Nov 13, 2021

Empty catch block in watch stream event handling #593

Empty catch block in watch stream event handling #593

Comments

dominykas commented Feb 24, 2021 • edited Loading

dominykas commented Feb 24, 2021

brendandburns commented Feb 25, 2021

dominykas commented Feb 25, 2021

brendandburns commented Mar 2, 2021 • edited Loading

dominykas commented Mar 5, 2021

brendandburns commented Mar 10, 2021

brendandburns commented Mar 10, 2021

brendandburns commented Mar 10, 2021

dominykas commented Mar 17, 2021 • edited Loading

dominykas commented Mar 17, 2021 • edited Loading

fejta-bot commented Jun 15, 2021

dominykas commented Jun 16, 2021

dominykas commented Jun 16, 2021

k8s-triage-robot commented Sep 14, 2021

k8s-triage-robot commented Oct 14, 2021

k8s-triage-robot commented Nov 13, 2021

k8s-ci-robot commented Nov 13, 2021

dominykas commented Feb 24, 2021 •

edited

Loading

brendandburns commented Mar 2, 2021 •

edited

Loading

dominykas commented Mar 17, 2021 •

edited

Loading

dominykas commented Mar 17, 2021 •

edited

Loading