Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude codes.DeadlineExceeded from the list of retryable gRPC codes #1366

Merged
merged 1 commit into from
Jan 26, 2024

Conversation

alexshtin
Copy link
Member

@alexshtin alexshtin commented Jan 26, 2024

What was changed

Exclude codes.DeadlineExceeded from the list of retryable gRPC codes.

Why?

gRPC library doesn't retry "context errors" (codes.DeadlineExceeded and codes.Canceled), so remove them from the list of retryable codes to eliminate the confusion.

isRetriable: https://github.com/grpc-ecosystem/go-grpc-middleware/blob/main/interceptors/retry/retry.go#L269

isContextError: https://github.com/grpc-ecosystem/go-grpc-middleware/blob/main/interceptors/retry/retry.go#L283

@alexshtin alexshtin requested a review from a team as a code owner January 26, 2024 20:16
@Quinn-With-Two-Ns
Copy link
Contributor

This list of errors is common across SDKs, do we need to update them as well?

@alexshtin
Copy link
Member Author

For that you need to check gRPC library implementation in those langs. I hope it is consistent there.

@cretz
Copy link
Member

cretz commented Jan 26, 2024

gRPC library doesn't retry "context errors" (codes.DeadlineExceeded and codes.Canceled)

DeadlineExceeded can be a client context error or a server error. I am a bit worried to assume they are only "context errors".

@alexshtin
Copy link
Member Author

alexshtin commented Jan 26, 2024

I think this should be checked/fixed on server. Server should never explicitly return CDE if user context is not timed out yet. I checked persistence layer, and CDE is always replaced with internal error (persistence.TimeoutError) which then becomes codes.Unknown on gRPC layer. I am pretty much sure there are other places, where server uses another context with shorter timeout, gets CDE, and returns it all the way up. This places need to be fixed.

@cretz
Copy link
Member

cretz commented Jan 26, 2024

To confirm this PR, since the interceptor skips retrying this code no matter what, this is just to remove the unnecessary item from our list? There should be no runtime effect, correct?

Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, would like @Quinn-With-Two-Ns approval before merge.

@alexshtin
Copy link
Member Author

Yes, it doesn't change behaviour. CDE was never retried (I've just discovered this today). So I decided to align our codes with gRPC lib implementation.

@alexshtin alexshtin merged commit 536b875 into temporalio:master Jan 26, 2024
12 checks passed
@alexshtin alexshtin deleted the fix/exclude-cde-retryable branch January 26, 2024 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants