-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancer and HealthUtil now differentiate between fatal and transient errors #1427
Conversation
Summary: previously any error returned by the `DefaultEnhancer` was treated as fatal. It made sense because it would act deterministically on a prepared set of resources. However, since #1319 enhancer also uses discovery interface to determine whether or not given resource is namespaced. These API requests may fail and must be retried, hence the introduction of a transient error. Fixes #1413 Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
Removed `engine.ErrTransientExecution` as all non-fatal errors are now treated as transient Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
…at are not recoverable Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
pkg/engine/health/health.go
Outdated
@@ -83,6 +70,10 @@ func IsHealthy(obj runtime.Object) error { | |||
log.Printf("HealthUtil: Job \"%v\" is marked healthy", obj.Name) | |||
return nil | |||
} | |||
if terminal, msg := isJobTerminallyFailed(obj); terminal { | |||
return fmt.Errorf("%wHealthUtil: Job \"%v\" has failed terminally: %s", engine.ErrFatalExecution, obj.Name, msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, we use engine.ErrFatalExecution
marker to denote health errors that are likely not recoverable. The caller may or may not act upon this information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing I don't like about it is that ErrFatalExecution
is defined as errors.New("fatal error: ")
and that fatal error:
string breaks error message readability.
@@ -58,8 +60,8 @@ func (at ApplyTask) Run(ctx Context) (bool, error) { | |||
// 4. - Check health for all resources - | |||
err = isHealthy(applied) | |||
if err != nil { | |||
if fatal := isTerminallyFailed(applied); fatal != nil { | |||
return false, fatalExecutionError(fatal, failedTerminalState, ctx.Meta) | |||
if errors.Is(err, engine.ErrFatalExecution) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We return a type engine.ExecutionError struct
if we encountered a fatal health error.
@@ -239,16 +241,6 @@ func isHealthy(ro []runtime.Object) error { | |||
return nil | |||
} | |||
|
|||
func isTerminallyFailed(ro []runtime.Object) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguably this is a better separation of concerns: determining whether a resource health error is terminal is better left to the HealthUtil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern is that (AFAICT) the unit of retrying is the whole Apply
operation, which runs on potentially dozens of templates/objects. This might amplify transient issues and not really help, unless we have caching for IsNamespacedObject
🤔
pkg/engine/task/task_apply_test.go
Outdated
@@ -193,5 +196,5 @@ func (k *testEnhancer) Apply(templates map[string]string, metadata renderer.Meta | |||
type errorEnhancer struct{} | |||
|
|||
func (k *errorEnhancer) Apply(templates map[string]string, metadata renderer.Metadata) ([]runtime.Object, error) { | |||
return nil, errors.New("always error") | |||
return nil, fmt.Errorf("%wsomething bad happens", engine.ErrFatalExecution) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil, fmt.Errorf("%wsomething bad happens", engine.ErrFatalExecution) | |
return nil, fmt.Errorf("%wsomething bad happens every time", engine.ErrFatalExecution) |
Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
Correct. Thought fatally failing a plan, because an API request timeout is less than ideal too. Caching for the rescue! |
Summary:
previously any error returned by the
Enhancer
was treated as fatal. It made sense because it would act deterministically on a prepared set of resources. However, since #1319 enhancer also uses the discovery interface to determine whether or not the given resource is namespaced. These API requests may fail and must be retried. With this fix,DefaultEnhancer
clearly marks certain errors as fatal.Fixes #1413
Signed-off-by: Aleksey Dukhovniy alex.dukhovniy@googlemail.com