Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry "non retryable" error on worker long poll #1034

Merged

Conversation

Quinn-With-Two-Ns
Copy link
Contributor

Retry "non retryable" errors for a period of time like core. This helps handle some know edge cases where proxies may way rewrite grpc timeouts when a server fails over and cause an error from the server. This aligns the go sdk with core. The retry period was picked at 2 minutes because it is greater than the long poll interval so in the proxy failure mode we still retry at least once.

See also:
temporalio/features#218

@Quinn-With-Two-Ns Quinn-With-Two-Ns requested a review from a team as a code owner February 9, 2023 17:52
Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks perfect!

@cretz
Copy link
Member

cretz commented Feb 9, 2023

@Quinn-With-Two-Ns - Is the increase in timeout needed because of this change? I'm not sure our integration test for worker error should have to wait a whole 2 minutes when it took seconds before. Maybe it'd be worth a way to turn this 2m minimum off and if we want to test the full time taken before fatal, we can move that to features repo.

@Quinn-With-Two-Ns
Copy link
Contributor Author

@cretz Yes that is why it is needed, Moving to features does not help because we run features test as part of CI.

@Quinn-With-Two-Ns
Copy link
Contributor Author

I looked into running these tests in parallel, but testify does not support it.

We could lower the timeout for test, but that would require exposing the timeout to users, which I thought we didn't want to do.

@cretz
Copy link
Member

cretz commented Feb 9, 2023

We could lower the timeout for test, but that would require exposing the timeout to users, which I thought we didn't want to do.

I agree we probably don't want to do it. Technically you can access internal from our integration tests, but I'd be ok with exposing as a worker option too. If the choice is between exposing and adding 2 mins to every test run, I think the former is better :-) (but again, if you want can just make a visible var in internal I think)

@Quinn-With-Two-Ns Quinn-With-Two-Ns merged commit f554827 into temporalio:master Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants