-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry GCP requests on server error #2243
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2243 +/- ##
==========================================
- Coverage 82.30% 82.21% -0.10%
==========================================
Files 241 245 +4
Lines 62437 62505 +68
==========================================
- Hits 51389 51387 -2
- Misses 11048 11118 +70
Help us with your feedback. Take ten seconds to tell us how you rate us. |
/// error will be surfaced to the application, but also bounds | ||
/// the length of time a request's credentials must remain valid. | ||
/// | ||
/// As requests are retried without renewing credentials or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could theoretically re-sign requests / regenerate credentials, however, I decided against this for a couple of reasons:
- It's non-trivial additional complexity
- The intent of this feature is to hide intermittent failures, a 5 minute outage is not really intermittent
- We want to surface the error to the user eventually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nitpicks, but overall this looks good!
} | ||
|
||
impl RetryExt for reqwest::RequestBuilder { | ||
fn send_retry(self, config: &RetryConfig) -> BoxFuture<'static, Result<Response>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some logging would be nice here for:
- retries
- giving up
object_store/Cargo.toml
Outdated
@@ -48,6 +48,7 @@ quick-xml = { version = "0.23.0", features = ["serialize"], optional = true } | |||
rustls-pemfile = { version = "1.0", default-features = false, optional = true } | |||
ring = { version = "0.16", default-features = false, features = ["std"] } | |||
base64 = { version = "0.13", default-features = false, optional = true } | |||
rand = { version = "0.8", optional = true, features = ["std", "std_rng"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The features listed here are actually the default features that should always be included. So you could remove the explicit listing or pass default-features = false
to prevent a silent extension of this feature set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was following the pattern established above, I'm not really sure which is better tbh. Using default-features is nice, but some crates have lots, I went for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the crates above also use default-features = false
which rand
now doesn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, oops, yeah that's a typo 😅
#[cfg(test)] | ||
mod tests { | ||
use super::*; | ||
use rand::rngs::mock::StepRng; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL that there's rand::rngs::mock
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- thank you @tustvold
{ | ||
let sleep = backoff.next(); | ||
retries += 1; | ||
info!("Encountered server error, backing off for {} seconds, retry {} of {}", sleep.as_secs_f32(), retries, max_retries); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Benchmark runs are scheduled for baseline = b826162 and contender = 299908e. 299908e is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Part of #2244
Relates to #2176
Rationale for this change
The S3 implementation currently has request retry support, as we look to move away from rusoto we need to ensure we can preserve this functionality. This PR therefore adds the necessary functionality to the GCP implementation, which can then be reused for AWS and Azure once they switch away from using SDKs.
What changes are included in this PR?
Adds an implementation of exponential backoff, lifted wholesale from the implementation I wrote for rskafa.
Are there any user-facing changes?
Technically yes, GCP requests will now be automatically retried. We could change the default to avoid this but I think it is unlikely to cause issue