Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apollo DataSources REST: Request Parallelization #1451

Closed
Saeris opened this issue Jul 31, 2018 · 4 comments
Closed

Apollo DataSources REST: Request Parallelization #1451

Saeris opened this issue Jul 31, 2018 · 4 comments

Comments

@Saeris
Copy link

Saeris commented Jul 31, 2018

Hey there! Not sure whether to label this a bug report or a feature request. The gist of it is, when multiple requests are in flight to a REST API using apollo-datasource-rest, they all run in sequence, rather than in parallel, which inflates the response time for some queries.

To better illustrate this, I've thrown together a basic example forked from fullstack-workshop-server, which from this screenshot from Engine you can better see what I'm talking about:

Requests Cascade

And here's the fork itself:

https://github.com/Saeris/fullstack-workshop-server/tree/parallel-requests-example

Basically here's what's happening:

  1. First we execute a query to fetch a list of popular movies.
  2. Because the movie data returned from this REST call is incomplete, we map over the movie IDs and make separate calls to another endpoint to fetch the full movie details. We use a feature of the REST API to append the cast information to our response, so we don't have to make a third call for each movie.

What should happen here is once we have all the movie IDs, we should fetch all of their data in parallel to optimize the response time of our server. That way resolving movies: [Movie]! in our schema will only take as long as the longest response of the 20 requests takes to come back. Which in the screenshot above would be 6.86ms, instead of the total 29.2ms it took to resolve the whole cascade.

Obviously the deeper you went in a schema similar to this, the more time could be saved by running requests in parallel.

I'm not entirely certain as to how this can be accomplished. This just came up as I was picking apart the code for apollo-datasource-rest to solve an issue I'd been having with caching in Lambda.

Also, please note that even though we're clearly hitting the cache for repeat requests for this query (you'll notice the response time shrink considerably), Engine does not appear to be receiving any caching information. I believe I've got the server configured correctly to support it, so I don't know what's wrong there. If you inspect the network requests being sent in Playground, you'll see that cacheControl is included in the extensions along with the other tracing data.

@martijnwalraven
Copy link
Contributor

martijnwalraven commented Jul 31, 2018

There is nothing in apollo-datasource-rest keeping it from performing parallel requests, and it is in fact performing parallel requests in your example. The waterfall you're seeing is a bit misleading, because the traces for individual movies aren't actually representing fetches but simply invocations of the resolvers. That's why the timings are so tiny (less than a millisecond to several milliseconds). So the waterfall is a result of Node execution being single threaded.

In your example, both fetching the list of popular movies and fetching individual movies happens in the movies resolver: https://github.com/Saeris/fullstack-workshop-server/blob/563e428a7453f44fbaf00890ba562edd55329310/src/resolvers/query.js#L17-L18

@martijnwalraven
Copy link
Contributor

Also, please note that even though we're clearly hitting the cache for repeat requests for this query (you'll notice the response time shrink considerably), Engine does not appear to be receiving any caching information. I believe I've got the server configured correctly to support it, so I don't know what's wrong there. If you inspect the network requests being sent in Playground, you'll see that cacheControl is included in the extensions along with the other tracing data.

The caching you're seeing is caching of REST responses, which is separate from the whole query caching Engine shows. @cacheControl has no effect on data source caching, and Apollo Server currently doesn't cache whole responses.

@Saeris
Copy link
Author

Saeris commented Jul 31, 2018

The caching you're seeing is caching of REST responses, which is separate from the whole query caching Engine shows. @CacheControl has no effect on data source caching, and Apollo Server currently doesn't cache whole responses.

So in the original example from that repo, the @cacheControl defined on the Cast type doesn't actually do anything then? I guess all I'm trying to point out here is that it's a bit misleading how caching works from the examples and the documentation. While I was able to gather this much from taking apart apollo-datasource-rest today, I don't recall reading in the documentation anywhere that the directives don't work with Datasources. As we discussed on Slack, the TTL value does come from the request/response headers unless manually specified. Perhaps it's a good idea to add that to the documentation? I'm sure other people have been confused about this as well. I've been asked a few times on Twitter if I've gotten caching to work yet, and I think others were under the same assumption that I was; that if Engine isn't showing that the cache is being hit, then there's something wrong with the server configuration.

There is nothing in apollo-datasource-rest keeping it from performing parallel requests, and it is in fact performing parallel requests in your example. The waterfall you're seeing is a bit misleading, because the traces for individual movies aren't actually representing fetches but simply invocations of the resolvers. That's why the timings are so tiny (less than a millisecond to several milliseconds). So the waterfall is a result of Node execution being single threaded.

Yeah that is pretty misleading. I wonder if that UI could be improved because to me that seemed like a clear sign that I was doing something wrong.

@abernix
Copy link
Member

abernix commented Jul 2, 2019

Thanks for opening this originally! We've expanded and changed our documentation quite a bit since this was originally opened and introduced a full-response cache within Apollo Server which takes the place (but open-source and written in JavaScript) of the previous generation Apollo Engine Proxy (black-box Go binary) which once served a similar same purpose. This new full-response cache is documented here and respects the @cacheControl headers.

Furthermore, the data sources documentation has come along since this was first opened as well.

It seems like any remaining concern here might be a concern with Apollo Engine itself, rather than Apollo Server, but that would be best surfaced to the Apollo Engine team using the Support tooling within Apollo Engine. I'm sure the Apollo Platform team would very much appreciate feedback on how the apollo-server-plugin-response-cache plugin cache results are portrayed within Apollo Engine, so please do pass that along to them, but I think we can close this on the open-source side of things.

Thanks again!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants