-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
subgraph retry policy and circuit breaking #338
Comments
There are some existing Tower layers that might be worth looking at here. They may or may not be suitable, be GraphQL-aware, particularly if there are non-retryable properties. (In those cases, perhaps we can wrap one of the Tower layers that offers functionality and provide it the knowledge it needs) (This could also relate to the fetcher Error-handling conversation.) |
Follows up #1347 |
adding a retry would fix #1956 |
So I looked into the tower retry layer, and along with the issue of recognizing if a query is retryable (easy to decide on HTTP errors, less so for graphql ones), I am encountering architecture issues with plugins, the same I had in #1889: the layer will need the underlying service to be clonable. In that PR, I solved it by moving the query deduplication layer out of the traffic shaping plugin, and instead applying it directly at the subgraph service creation. Then plugins like the traffic shaping are applied above that, taking as argument and returning a (non clonable) I could do the same for retries, but we will encounter issues with plugin ordering:
So I would like to remove the subgraph plugin part of the traffic shaping plugin and move it to the subgraph service, or better, add a more generic internal trait that can be implemented by some plugins, to add layers to the subgraph and other services, without working on a |
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
we should be able to specify a timeout for queries to a subgraph, and different techniques to retry queries or stop sending traffic temporarily to a subgraph.
While those options are set globally for a subgraph, we should take care of maintaining a list of hosts for that subgraph with query timing and success statistics per host.
The text was updated successfully, but these errors were encountered: