Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry GCP requests on server error #2243

Merged
merged 7 commits into from
Aug 3, 2022
Merged

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Jul 30, 2022

Which issue does this PR close?

Part of #2244
Relates to #2176

Rationale for this change

The S3 implementation currently has request retry support, as we look to move away from rusoto we need to ensure we can preserve this functionality. This PR therefore adds the necessary functionality to the GCP implementation, which can then be reused for AWS and Azure once they switch away from using SDKs.

What changes are included in this PR?

Adds an implementation of exponential backoff, lifted wholesale from the implementation I wrote for rskafa.

Are there any user-facing changes?

Technically yes, GCP requests will now be automatically retried. We could change the default to avoid this but I think it is unlikely to cause issue

@github-actions github-actions bot added the object-store Object Store Interface label Jul 30, 2022
@codecov-commenter
Copy link

codecov-commenter commented Jul 30, 2022

Codecov Report

Merging #2243 (8cbad44) into master (f41fb1c) will decrease coverage by 0.09%.
The diff coverage is 69.36%.

❗ Current head 8cbad44 differs from pull request most recent head 218d969. Consider uploading reports for the commit 218d969 to get more accurate results

@@            Coverage Diff             @@
##           master    #2243      +/-   ##
==========================================
- Coverage   82.30%   82.21%   -0.10%     
==========================================
  Files         241      245       +4     
  Lines       62437    62505      +68     
==========================================
- Hits        51389    51387       -2     
- Misses      11048    11118      +70     
Impacted Files Coverage Δ
arrow/src/array/array_decimal.rs 90.32% <ø> (ø)
arrow/src/array/data.rs 85.03% <ø> (ø)
arrow/src/datatypes/schema.rs 70.41% <ø> (ø)
object_store/src/aws.rs 0.00% <ø> (ø)
object_store/src/client/backoff.rs 0.00% <0.00%> (ø)
object_store/src/client/oauth.rs 0.00% <0.00%> (ø)
object_store/src/client/retry.rs 0.00% <0.00%> (ø)
object_store/src/client/token.rs 0.00% <ø> (ø)
object_store/src/gcp.rs 0.00% <0.00%> (ø)
object_store/src/lib.rs 86.75% <ø> (ø)
... and 19 more

Help us with your feedback. Take ten seconds to tell us how you rate us.

/// error will be surfaced to the application, but also bounds
/// the length of time a request's credentials must remain valid.
///
/// As requests are retried without renewing credentials or
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could theoretically re-sign requests / regenerate credentials, however, I decided against this for a couple of reasons:

  • It's non-trivial additional complexity
  • The intent of this feature is to hide intermittent failures, a 5 minute outage is not really intermittent
  • We want to surface the error to the user eventually

Copy link
Contributor

@crepererum crepererum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nitpicks, but overall this looks good!

}

impl RetryExt for reqwest::RequestBuilder {
fn send_retry(self, config: &RetryConfig) -> BoxFuture<'static, Result<Response>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some logging would be nice here for:

  • retries
  • giving up

@@ -48,6 +48,7 @@ quick-xml = { version = "0.23.0", features = ["serialize"], optional = true }
rustls-pemfile = { version = "1.0", default-features = false, optional = true }
ring = { version = "0.16", default-features = false, features = ["std"] }
base64 = { version = "0.13", default-features = false, optional = true }
rand = { version = "0.8", optional = true, features = ["std", "std_rng"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The features listed here are actually the default features that should always be included. So you could remove the explicit listing or pass default-features = false to prevent a silent extension of this feature set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the pattern established above, I'm not really sure which is better tbh. Using default-features is nice, but some crates have lots, I went for consistency

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the crates above also use default-features = false which rand now doesn't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, oops, yeah that's a typo 😅

#[cfg(test)]
mod tests {
use super::*;
use rand::rngs::mock::StepRng;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL that there's rand::rngs::mock 👍

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @tustvold

{
let sleep = backoff.next();
retries += 1;
info!("Encountered server error, backing off for {} seconds, retry {} of {}", sleep.as_secs_f32(), retries, max_retries);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit 299908e into apache:master Aug 3, 2022
@ursabot
Copy link

ursabot commented Aug 3, 2022

Benchmark runs are scheduled for baseline = b826162 and contender = 299908e. 299908e is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
object-store Object Store Interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants