fix: stopped watching of resources - avoid overflow exceptions in resource watcher OnException handle… #505

tomasfabian · 2023-01-16T10:55:36Z

This PR provides a fix for the resource watcher's exp. backoff reconnection logic. After reaching the bellow explained retry attempts threshold the reconciliation of the watched resources halts.

…r #482

Description:
The _reconnectHandler publisher (IObserver<TimeSpan> part of the ISubject<TimeSpan>) stops emitting new values into the pipeline after 40 consecutively failed reconnection attempts due to an unhandled overflow exception and the reconnection subscription represented by the IObservable<Timespan> would remain idle, but the background thread is unnoticedly aborted.

The PR provides the following improvements and fixes:

checks the _reconnectAttempts variable against a constant MaxRetriesAttempts with value set to 39 attempts in order to avoid System.OverflowException thrown from the ErrorBackoffStrategy delegate.
the logic in the OnException event handler is wrapped with a try-catch-finally block. The finally block ensures that new values - interpreted as Timer ticks ensuring latency between inversion of control invocations - will be produced in case of arbitrary exceptions in the future.
adds Retry operator into the reconnection handler's subscription pipeline and logs errors on OnError messages.

Let me know if I should move some or all of the following MaxRetriesAttempts logic, constant and the DefaultBackoff property directly into the BackoffStrategies static class. This would lead to a more implicit (hidden) error guard. DefaultBackoff is necessary in case of introducing a finally block in the OnException method.

I could also introduce some unit tests for the ResourceWatcher<TEntity> class by injecting System.Reactive.Concurrency.IScheduler into its constructor in order to mock time and threads.

Operators without proper unobserved thread error logging are unaware about the issue. I would recommend to log such errors in the following way:

    var app = builder.Build();

    void TaskSchedulerUnobservedTaskException(object? sender, UnobservedTaskExceptionEventArgs e)
    {
        app.Logger.LogError(e.Exception, "Caught unhandled TaskScheduler exception");
    }

    app.Lifetime.ApplicationStarted.Register(() =>
    {
        TaskScheduler.UnobservedTaskException += TaskSchedulerUnobservedTaskException;
    });
    app.Lifetime.ApplicationStopped.Register(() =>
    {
        TaskScheduler.UnobservedTaskException -= TaskSchedulerUnobservedTaskException;
    });

…#482

buehler

What do you think about the number?

The rest seems perfectly fine to me. I'm happy to merge it if you gave me your opinion on my comment.

src/KubeOps/Operator/Kubernetes/ResourceWatcher{TEntity}.cs

buehler · 2023-01-16T13:04:10Z

And of course: Thank you for contributing to this project! <3

buehler · 2023-01-16T13:12:33Z

Linting and testing must succeed for the PR to be merged.

tomasfabian · 2023-01-16T13:15:12Z

Linting and testing must succeed for the PR to be merged.

I will check it, but the master branch from which I forked is in erroneous state, too,

tomasfabian · 2023-01-16T13:45:45Z

Linting and testing must succeed for the PR to be merged.

You were right. I tried to fix the linting issues in the second commit. Can I run the analyzer somehow from here, please?

buehler · 2023-01-16T13:49:36Z

Yea, if you push to this PR, I can approve and run the analyzer. Also, if you see any warnings during "build whole solution", this is a strong indicator that the build is going to fail. All warnings produce errors in production mode.

tomasfabian · 2023-01-16T17:11:09Z

Thank you for the review and approval @buehler!
The release is probably halted by this semantic release error. Do I understand it correctly, please?

fix: avoid overflow exceptions in resource watcher OnException handler …

fb682e2

…#482

tomasfabian mentioned this pull request Jan 16, 2023

System.IO.IOException: The request was aborted in the KubeOps.Operator.Kubernetes.ResourceWatcher #482

Closed

buehler requested changes Jan 16, 2023

View reviewed changes

src/KubeOps/Operator/Kubernetes/ResourceWatcher{TEntity}.cs Outdated Show resolved Hide resolved

buehler previously approved these changes Jan 16, 2023

View reviewed changes

buehler enabled auto-merge (squash) January 16, 2023 13:11

fixed linting issues

bb9f136

auto-merge was automatically disabled January 16, 2023 13:37
Head branch was pushed to by a user without write access

tomasfabian dismissed buehler’s stale review via bb9f136 January 16, 2023 13:37

buehler merged commit 6fdd512 into buehler:master Jan 16, 2023

tomasfabian changed the title ~~fix: avoid overflow exceptions in resource watcher OnException handle…~~ fix: stopped watching of resources - avoid overflow exceptions in resource watcher OnException handle… Jan 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: stopped watching of resources - avoid overflow exceptions in resource watcher OnException handle… #505

fix: stopped watching of resources - avoid overflow exceptions in resource watcher OnException handle… #505

tomasfabian commented Jan 16, 2023 •

edited

Loading

buehler left a comment

buehler commented Jan 16, 2023

buehler commented Jan 16, 2023

tomasfabian commented Jan 16, 2023

tomasfabian commented Jan 16, 2023

buehler commented Jan 16, 2023

tomasfabian commented Jan 16, 2023

fix: stopped watching of resources - avoid overflow exceptions in resource watcher OnException handle… #505

fix: stopped watching of resources - avoid overflow exceptions in resource watcher OnException handle… #505

Conversation

tomasfabian commented Jan 16, 2023 • edited Loading

buehler left a comment

Choose a reason for hiding this comment

buehler commented Jan 16, 2023

buehler commented Jan 16, 2023

tomasfabian commented Jan 16, 2023

tomasfabian commented Jan 16, 2023

buehler commented Jan 16, 2023

tomasfabian commented Jan 16, 2023

tomasfabian commented Jan 16, 2023 •

edited

Loading