Retry "non retryable" error on worker long poll #1034

Quinn-With-Two-Ns · 2023-02-09T17:52:31Z

Retry "non retryable" errors for a period of time like core. This helps handle some know edge cases where proxies may way rewrite grpc timeouts when a server fails over and cause an error from the server. This aligns the go sdk with core. The retry period was picked at 2 minutes because it is greater than the long poll interval so in the proxy failure mode we still retry at least once.

See also:
temporalio/features#218

cretz

Looks perfect!

cretz · 2023-02-09T19:33:13Z

@Quinn-With-Two-Ns - Is the increase in timeout needed because of this change? I'm not sure our integration test for worker error should have to wait a whole 2 minutes when it took seconds before. Maybe it'd be worth a way to turn this 2m minimum off and if we want to test the full time taken before fatal, we can move that to features repo.

Quinn-With-Two-Ns · 2023-02-09T19:44:50Z

@cretz Yes that is why it is needed, Moving to features does not help because we run features test as part of CI.

Quinn-With-Two-Ns · 2023-02-09T19:53:18Z

I looked into running these tests in parallel, but testify does not support it.

We could lower the timeout for test, but that would require exposing the timeout to users, which I thought we didn't want to do.

cretz · 2023-02-09T20:02:32Z

We could lower the timeout for test, but that would require exposing the timeout to users, which I thought we didn't want to do.

I agree we probably don't want to do it. Technically you can access internal from our integration tests, but I'd be ok with exposing as a worker option too. If the choice is between exposing and adding 2 mins to every test run, I think the former is better :-) (but again, if you want can just make a visible var in internal I think)

Retry "non retryable" error on worker long poll

d2c9ab2

Quinn-With-Two-Ns requested a review from a team as a code owner February 9, 2023 17:52

Quinn-With-Two-Ns requested a review from cretz February 9, 2023 17:52

Spikhalskiy approved these changes Feb 9, 2023

View reviewed changes

cretz approved these changes Feb 9, 2023

View reviewed changes

Quinn-With-Two-Ns force-pushed the retry_fatal_error branch from d8d4118 to d278a42 Compare February 9, 2023 21:19

cretz approved these changes Feb 9, 2023

View reviewed changes

Add ability to set LongPollGracePeriod for debug

1b4a3cd

Quinn-With-Two-Ns force-pushed the retry_fatal_error branch from d278a42 to 1b4a3cd Compare February 9, 2023 21:28

Quinn-With-Two-Ns merged commit f554827 into temporalio:master Feb 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry "non retryable" error on worker long poll #1034

Retry "non retryable" error on worker long poll #1034

Quinn-With-Two-Ns commented Feb 9, 2023

cretz left a comment

cretz commented Feb 9, 2023 •

edited

Loading

Quinn-With-Two-Ns commented Feb 9, 2023

Quinn-With-Two-Ns commented Feb 9, 2023

cretz commented Feb 9, 2023

Retry "non retryable" error on worker long poll #1034

Retry "non retryable" error on worker long poll #1034

Conversation

Quinn-With-Two-Ns commented Feb 9, 2023

cretz left a comment

Choose a reason for hiding this comment

cretz commented Feb 9, 2023 • edited Loading

Quinn-With-Two-Ns commented Feb 9, 2023

Quinn-With-Two-Ns commented Feb 9, 2023

cretz commented Feb 9, 2023

cretz commented Feb 9, 2023 •

edited

Loading