Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retryable transport for errors #1704

Merged
merged 60 commits into from
Jan 10, 2024
Merged

Conversation

dcfranca
Copy link
Contributor

@dcfranca dcfranca commented May 29, 2023

In order to address the issue #210
I have added 3 new parameters to the provider

  • retry_delay_ms
  • max_retries
  • retryable_errors

In case max_retries is > 0 (defaults to zero)
it will retry the request in case it is an error
the retryable_errors defaults to [500, 502, 503, 504]

It retries after the ms specified in retry_delay_ms (defaults to 1000)

Resolves #210


Behavior

Before the change?

  • When there are intermittent errors on the GitHub API the provider fails directly

After the change?

  • You should be able to set different behaviors to handle failures from the GitHub API, retrying specific errors and delaying for next retry

Other information


Additional info

Pull request checklist

  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been reviewed and added / updated if needed (for bug fixes / features)
  • [] Added the appropriate label for the given change

Does this introduce a breaking change?

The code should work as before in case no max_retries is set

Please see our docs on breaking changes to help!

  • Yes (Please add the Type: Breaking change label)
  • No

If Yes, what's the impact:

  • N/A

Pull request type

Please add the corresponding label for change this PR introduces:
I don't think I have permission to add a label here

  • Bugfix: Type: Bug
  • Feature/model/API additions: Type: Feature
  • Updates to docs or samples: Type: Documentation
  • Dependencies/code cleanup: Type: Maintenance

dcfranca added 2 commits May 29, 2023 14:39
In order to address the issue integrations#210
I have added 3 new parameters to the provider

- retry_delay_ms
- max_retries
- retryable_errors

In case max_retries is > 0 (defaults to zero)
it will retry the request in case it is an error
the retryable_errors defaults to [500, 502, 503, 504]

It retries after the ms specified in retry_delay_ms (defaults to 1000)
@dcfranca dcfranca requested a review from ilmax May 31, 2023 18:16
Copy link
Member

@kfcampbell kfcampbell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your attention to testing! This will be a cool feature to get released.

github/config.go Show resolved Hide resolved
github/transport_test.go Outdated Show resolved Hide resolved
@kfcampbell
Copy link
Member

@dcfranca what are your thoughts on handling the error to silence the lint warning?

@dcfranca
Copy link
Contributor Author

dcfranca commented Jun 3, 2023

@dcfranca what are your thoughts on handling the error to silence the lint warning?

Done

@dcfranca
Copy link
Contributor Author

dcfranca commented Jun 6, 2023

@kfcampbell
Do I need to do something else for the awaiting approvals?

@dcfranca
Copy link
Contributor Author

dcfranca commented Jun 8, 2023

@kfcampbell @ilmax Hey, any update on this?

@dcfranca
Copy link
Contributor Author

@kfcampbell @ilmax Hi, is there anything else I need to do here?

@ilmax
Copy link
Contributor

ilmax commented Jun 12, 2023

Hey @dcfranca I have no power here 😅 I'm just an end user, @kfcampbell is the one you're looking for.
On a side note, I saw that it takes a while before PR gets reviewed and merged so just sit back, 🍿 relax and wait for this to be reviewed/merged 😄

github/transport.go Outdated Show resolved Hide resolved

for retry := 0; retry <= t.maxRetries; retry++ {
// Reset the body
// Code from httpretry (https://github.com/ybbus/httpretry/blob/master/roundtripper.go#L60)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you talk to me a little bit about this code and why you chose it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How was this tested?

Copy link
Contributor Author

@dcfranca dcfranca Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests I have added on transport_test.go were failing due to the body not being reset after the first retry, then I checked the package I use on some of my projects to retry (httpretry) and saw that they have a code for this scenario, so I shameless copied it and the tests succeed

I didn't figure out how to test the specific scenario of using the function GetBody, the original library also only tests without the GetBody, but if you have an idea on how to test with it I can update the tests

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a manual workload you can test this on that will reliably trigger a 500? I'm hesitant to merge/release to a wide audience without a bit more description of the validation process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a Flux integration that quite often gives me a 500 (a few times a day), I can try linking my branch instead of the default provider there... however, I have never integrated an in development provider, so I will appreciate anything that can guide me through this
I'm also a bit afraid of doing that on this workload, it is not critical, but I would expect there are more changes here that are not tested/stable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can create the binary for this provider and point Terraform towards it by seeing our debugging instructions.

We do not have successful automated integration testing at the moment so we've been over-relying on manual testing. I don't feel super comfortable merging without manually testing this extensively first.

github/transport.go Outdated Show resolved Hide resolved
Co-authored-by: Keegan Campbell <me@kfcampbell.com>
@gm-hyp
Copy link

gm-hyp commented Sep 28, 2023

Hello! When do you believe this will get merged? This behaviour is seen when refreshing/creating the resource github_emu_group_mapping linked to a team larger than 1500 people. It is kinda painful as it fails about 50% of the time with this error:


Error: GET https://api.github.com/orgs/<hidden>/external-group/<hidden>: 502 Server Error []

  with github_emu_group_mapping.<hidden>,
  on <hidden>.tf line 7, in resource "github_emu_group_mapping" "<hidden>":
   7: resource "github_emu_group_mapping" "<hidden>" {

Error: Terraform exited with code 1.
Error: Process completed with exit code 1.

The only way to make it succeed is to retry 3-4-5 times. Having a way to specify a retry mechanism would help immensely. Thank you for the work done here!

@dcfranca
Copy link
Contributor Author

dcfranca commented Oct 3, 2023

Hello! When do you believe this will get merged? This behaviour is seen when refreshing/creating the resource github_emu_group_mapping linked to a team larger than 1500 people. It is kinda painful as it fails about 50% of the time with this error:


Error: GET https://api.github.com/orgs/<hidden>/external-group/<hidden>: 502 Server Error []

  with github_emu_group_mapping.<hidden>,
  on <hidden>.tf line 7, in resource "github_emu_group_mapping" "<hidden>":
   7: resource "github_emu_group_mapping" "<hidden>" {

Error: Terraform exited with code 1.
Error: Process completed with exit code 1.

The only way to make it succeed is to retry 3-4-5 times. Having a way to specify a retry mechanism would help immensely. Thank you for the work done here!

I'd love to know as well :D

@kfcampbell
Copy link
Member

@gm-hyp @dcfranca we'd love to release a major version in the near future (next couple of months?) pending our team's capacity to do so. We have several items we're looking at merging into that version that we'll need to test together thoroughly. Thanks for your persistence here.

@nickfloyd nickfloyd added the Type: Breaking change Used to note any change that requires a major version bump label Dec 11, 2023
@kfcampbell kfcampbell changed the base branch from main to v6 January 10, 2024 23:28
@kfcampbell
Copy link
Member

I've changed the target branch of this PR to the v6 branch. I'll create a PR shortly from that branch to main, and use that branch as the location for manual testing of the features to be added in the next major version.

@kfcampbell kfcampbell merged commit 2dddeb3 into integrations:v6 Jan 10, 2024
1 check passed
@kfcampbell kfcampbell mentioned this pull request Jan 10, 2024
Merged
@kfcampbell
Copy link
Member

See #2091.

kfcampbell added a commit that referenced this pull request Feb 16, 2024
* Add retryable transport for errors (#1704)

* Add retryable transport for errors

In order to address the issue #210
I have added 3 new parameters to the provider

- retry_delay_ms
- max_retries
- retryable_errors

In case max_retries is > 0 (defaults to zero)
it will retry the request in case it is an error
the retryable_errors defaults to [500, 502, 503, 504]

It retries after the ms specified in retry_delay_ms (defaults to 1000)

* Update documentation with new parameters for retry

* Change default of max_retries to 3

* Fix typo in naming

* Update github/transport_test.go

* Add error check for data seek

* Consolidate the default retriable errors on a function

* Fix typo on the comments

Co-authored-by: Keegan Campbell <me@kfcampbell.com>

* Update vendor

* Fix merging conflicts

* Update documentation with new parameters for retry

* Change default of max_retries to 3

* Fix typo in naming

* Add error check for data seek

* Update github/transport_test.go

* Consolidate the default retriable errors on a function

* Fix typo on the comments

Co-authored-by: Keegan Campbell <me@kfcampbell.com>

* Don't run go mod tidy on release (#1788)

* Don't run go mod tidy on release

* Be more specific about releases

* Fix lint with APIMeta deprecation

---------

Co-authored-by: Keegan Campbell <me@kfcampbell.com>
Co-authored-by: Nick Floyd <139819+nickfloyd@users.noreply.github.com>

* fix: remove repository topic from state if it doesnt exist in GitHub anymore (#1918)

* remove repository topic if they cannot be found in GitHub anymore

* Fix build error by bumping package version in offending file

---------

Co-authored-by: Keegan Campbell <me@kfcampbell.com>

* Bump version to v6 (#2106)

* Upgrade to Terraform Plugin SDK v2 (#1780)

* go mod tidy -go=1.16 && go mod tidy -go=1.17

* Run go mod vendor

* Attempt v2 upgrade

* Plugin compiling

* Fix some provider test errors

* Fix test compilation error

* ValidateFunc --> ValidateDiagFunc

* Fix casing

* Sprinkle toDiagFunc everywhere

* More fixes for validation functions

* State --> StateContext

* %s --> %v when printing diags

* ConfigureFunc --> ConfigureContextFunc

* Checking results of d.Set, round one

* Continue checking d.Set results

* Check results of d.Set, round three

* Checking d.Set results, round four

* d.Set round five

* In tests, export GITHUB_TEST_ORGANIZATION

* Remove unnecessary MaxItems on computed value

* Go build now works

* Resolve linting errors

* Apply diag.FromErr twice more

* Pass key names into toDiagFunc helper

* Construct cty.Path from strings

* Tests now working

* Update terraform-plugin-sdk version

* Remove commented attribute setting in resource_github_team.go

* Fix restrict pushes on github_branch_protection. Fix branch protection tests (#2045)

* Update restrict pushes. Fix branch protection tests

Change blocks_creations default to true. Fix broken build.

* add state migration

* rename push_restrictions to push_allowances

* correct state migration issue

* add docs clarification

* update migration func args

* fix test args

* cleanup tests

* Pass context.Background() in test

* fix timestamp fields

---------

Co-authored-by: Keegan Campbell <me@kfcampbell.com>

* Set group_id correctly (#2133)

* Run go get -u github.com/golangci/golangci-lint

* Separate github_team_members import from github_team as create_default_maintainers is not defined for members resource (#2126)

Co-authored-by: Keegan Campbell <me@kfcampbell.com>

* Add missing variable definition for test case

---------

Co-authored-by: Daniel França <github.t6297kgphp.dv@koderama.com>
Co-authored-by: Nick Floyd <139819+nickfloyd@users.noreply.github.com>
Co-authored-by: Felix Luthman <34520175+felixlut@users.noreply.github.com>
Co-authored-by: georgekaz <1391828+georgekaz@users.noreply.github.com>
Co-authored-by: Rich Young <richjyoung@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Breaking change Used to note any change that requires a major version bump vNext
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Retry Calls on HTTP 5XX Response Status Codes
5 participants