Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harden retries on tenant/timeline deletion path. #4973

Merged
merged 7 commits into from
Aug 14, 2023
Merged

Conversation

LizardWizzard
Copy link
Contributor

Originated from test failure where we hot SlowDown error from s3.
The patch generalizes download_retry to not be download specific.
Resulting retry function is moved to utils crate. download_retries
is now a thin wrapper around this retry function.

To ensure that all needed retries are in place test code now uses
test_remote_failures=1 setting.

Ref https://neondb.slack.com/archives/C059ZC138NR/p1691743624353009

Originated from test failure where we hot SlowDown error from s3.
The patch generalizes `download_retry` to not be download specific.
Resulting `retry` function is moved to utils crate. `download_retries`
is now a thin wrapper around this `retry` function.

To ensure that all needed retries are in place test code now uses
`test_remote_failures=1` setting.
@LizardWizzard LizardWizzard requested review from jcsp and koivunej August 11, 2023 13:43
@github-actions
Copy link

github-actions bot commented Aug 11, 2023

1588 tests run: 1511 passed, 0 failed, 77 skipped (full report)


Flaky tests (3)

Postgres 15

  • test_get_tenant_size_with_multiple_branches: debug
  • test_single_branch_get_tenant_size_grows: debug
  • test_s3_wal_replay[local_fs]: debug
The comment gets automatically updated with the latest test results
c25a737 at 2023-08-14T14:05:47.294Z :recycle:

@jcsp
Copy link
Collaborator

jcsp commented Aug 11, 2023

This PR isn't adding it but I'm curious: why do we have our own retry code instead of using a RetryConfig from aws_sdk_s3?

@LizardWizzard
Copy link
Contributor Author

LizardWizzard commented Aug 11, 2023

This PR isn't adding it but I'm curious: why do we have our own retry code instead of using a RetryConfig from aws_sdk_s3?

Hm, thats a good question. We wernt using aws sdk in the beginning, it is fairly new. Before that we used rusoto that is nowdeprecated.

Probably a good idea to consider switching. Also not sure if aws sdk works for other s3 vendors.

@LizardWizzard LizardWizzard marked this pull request as ready for review August 11, 2023 21:01
@LizardWizzard LizardWizzard requested review from a team as code owners August 11, 2023 21:01
@LizardWizzard LizardWizzard requested review from lubennikovaav and removed request for a team August 11, 2023 21:02
libs/utils/src/backoff.rs Outdated Show resolved Hide resolved
libs/utils/src/backoff.rs Outdated Show resolved Hide resolved
Copy link
Member

@koivunej koivunej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the tests should be split but we can merge this regardless.

@hlinnaka
Copy link
Contributor

This PR isn't adding it but I'm curious: why do we have our own retry code instead of using a RetryConfig from aws_sdk_s3?

Hm, thats a good question. We wernt using aws sdk in the beginning, it is fairly new. Before that we used rusoto that is nowdeprecated.

This. Also, by doing the retries ourselves, we have more control. IIRC we print a warning after some number of retries, for example.

@LizardWizzard LizardWizzard merged commit 4626d89 into main Aug 14, 2023
@LizardWizzard LizardWizzard deleted the dkr/more-retries branch August 14, 2023 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants