-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry "non retryable" error on worker long poll #1034
Retry "non retryable" error on worker long poll #1034
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks perfect!
@Quinn-With-Two-Ns - Is the increase in timeout needed because of this change? I'm not sure our integration test for worker error should have to wait a whole 2 minutes when it took seconds before. Maybe it'd be worth a way to turn this 2m minimum off and if we want to test the full time taken before fatal, we can move that to features repo. |
@cretz Yes that is why it is needed, Moving to features does not help because we run features test as part of CI. |
I looked into running these tests in parallel, but We could lower the timeout for test, but that would require exposing the timeout to users, which I thought we didn't want to do. |
I agree we probably don't want to do it. Technically you can access internal from our integration tests, but I'd be ok with exposing as a worker option too. If the choice is between exposing and adding 2 mins to every test run, I think the former is better :-) (but again, if you want can just make a visible var in internal I think) |
d8d4118
to
d278a42
Compare
d278a42
to
1b4a3cd
Compare
Retry "non retryable" errors for a period of time like core. This helps handle some know edge cases where proxies may way rewrite grpc timeouts when a server fails over and cause an error from the server. This aligns the go sdk with core. The retry period was picked at 2 minutes because it is greater than the long poll interval so in the proxy failure mode we still retry at least once.
See also:
temporalio/features#218