-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry backup/restore completion/finalizing status patching to unstuck inprogress backups/restores #7845
Retry backup/restore completion/finalizing status patching to unstuck inprogress backups/restores #7845
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
42d9dba
to
3045d3d
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7845 +/- ##
==========================================
- Coverage 58.79% 58.72% -0.07%
==========================================
Files 345 345
Lines 28764 28785 +21
==========================================
- Hits 16911 16905 -6
- Misses 10425 10451 +26
- Partials 1428 1429 +1 ☔ View full report in Codecov by Sentry. |
3045d3d
to
ae07397
Compare
// TODO: consider using a more specific error type to retry, for now, we retry on all errors | ||
// specific errors: | ||
// - connection refused: https://pkg.go.dev/syscall#:~:text=Errno(0x67)-,ECONNREFUSED,-%3D%20Errno(0x6f | ||
return retry.OnError(retry.DefaultBackoff, func(err error) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double check my math if this backoff is enough time to solve linked issue.
https://pkg.go.dev/k8s.io/client-go@v0.29.0/util/retry#pkg-variables
var DefaultBackoff = wait.Backoff{
Steps: 4,
Duration: 10 * time.Millisecond,
Factor: 5.0,
Jitter: 0.1,
}
ae07397
to
044dbb7
Compare
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
044dbb7
to
8ef8fcb
Compare
Received feedback to expand retry to other status patches and potentially for backup too. Will ask for feedback in issue. |
eeb3856
to
8abf577
Compare
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
e0f5b03
to
07aaa6b
Compare
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
07aaa6b
to
1587186
Compare
We are working on making each k8s API client call to be retriable in case of internal server errors caused by temporary API outages. The client-go offers the capability to implement a custom middleware transport for adding extra behavior or processing to the default http transport (https://github.com/kubernetes/client-go/blob/master/transport/config.go#L68). We are utilizing it to wrap the default http request with retries for specific error types https://github.com/openshift-kni/lifecycle-agent/pull/548/files#diff-99516c035075962960ff61611a293268a97b3a6535639e92580d0ef8de6eb8cf. |
That's cool. Will check, thanks @Missxiaoguo |
I'm thinking that we want to be careful here. We absolutely want retry on the status transitions away from InProgress and towards terminal states, but if we retry for up to 2 minutes for item restore, then a bad APIServer could cause a Restore to hang for hours/days for a big restore, when what we want to see there is mark the error and move on. |
yeah, my understanding is that the hang depends on the duration of API server outage regardless of small or big restore. The difference with a big restore is more underlying requests based on the number of resources. But it's very reasonable for your guys to be cautious! |
Trying Requeue + Fail without velero pod restart in another PR and we can discuss if we want this retry PR later. |
From community meeting, this PR is currently considered too specific and not applicable to most users and for the users that it applies to, backups during apiserver outage is considered risky. |
Requeue approach at #7863 |
closing as this won't get merged, we will discuss a better solution in the future. Downstream we do favor #7863 as that would "retry" via requeue for a longer time period. |
Signed-off-by: Tiger Kaovilai tkaovila@redhat.com
Thank you for contributing to Velero!
Please add a summary of your change
Does your change fix a particular issue?
Fixes #7207
Please indicate you've done the following:
/kind changelog-not-required
as a comment on this pull request.site/content/docs/main
.