do not hash the entire schema on every query plan cache lookup #5374

Geal · 2024-06-07T08:51:02Z

This is causing performance issues on big schemas

Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

Exceptions

Note any exceptions here

Notes

It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩

router-perf · 2024-06-07T08:51:34Z

abernix

On this PR, I think it's reasonable to check the "Changes are compatible" checkbox but not the introduction of new tests, so long as manual testing can be done — mostly because I think this might be complicated to test. And of course, "Performance impact assessed and acceptable" should be checked, if that is indeed true since this performance related.

Overall, it looks good to me, and thank you for taking care to make sure it matches our existing schema ID.

SimonSapin

Since schema_id() is now precomputed and kept in the Schema struct it’s tempting to make its other use (inject_schema_id) take it from Schema rather than compute it, but Schema is not created by that point. It’s something we can refactor later (potentially easier after removing Deno entirely?) but doesn’t need to block this PR.

Geal · 2024-06-07T10:13:17Z

I'd like to have Schema parsed before injecting the schema id yes. I think the reason it was not done there yet was that the API schema needed the JS planner to be available, so that will be removed soon.
We're actually already parsing the schema once before that, when checking for enterprise features in the schema, but at that point we only work with apollo-parser IIRC.
I really want to clean all of that up :)

Geal · 2024-06-07T10:22:13Z

the perf tests have run and they're ok, but I suspect we would only see an effect on a huge schema

IvanGoncharov · 2024-06-07T10:26:56Z

apollo-router/src/query_planner/caching_query_planner.rs

@@ -261,7 +261,7 @@ where
                query: query.clone(),
                operation: operation.clone(),
                hash: doc.hash.clone(),
-                sdl: Arc::clone(&self.schema.raw_sdl),
+                schema_id: Arc::clone(&self.schema.hash),


nit: if schema_id contains schema.hash maybe it should be called schema_hash.

just thinking out loud 💭

I renamed it to schema_id everywhere in fa19613 since we were already naming it that way elsewhere

abernix · 2024-06-07T10:50:22Z

the perf tests have run and they're ok, but I suspect we would only see an effect on a huge schema

Ok sounds good. Can you indicate that in the PR checklist? And check manual tests?

Geal · 2024-06-07T12:27:25Z

Manual tests show on 500kB schema that schema hashing and cache lookup could account for up to 15% of CPU time, while now it is down to less than 1%, with a 20% improvement in requests per second

xuorig · 2024-06-07T13:40:32Z

We have a similar fix running in prod right now that showed similar improvements on real workloads 👍

do not hash the entire schema on every query plan cache lookup

29cb060

Geal requested review from a team as code owners June 7, 2024 08:51

This comment has been minimized.

Sign in to view

apollo-bot2 assigned Geal Jun 7, 2024

changeset

1310aea

abernix reviewed Jun 7, 2024

View reviewed changes

abernix requested review from IvanGoncharov and SimonSapin June 7, 2024 08:55

SimonSapin approved these changes Jun 7, 2024

View reviewed changes

IvanGoncharov approved these changes Jun 7, 2024

View reviewed changes

lint

99c7ff1

IvanGoncharov reviewed Jun 7, 2024

View reviewed changes

rename hash to schema_id

fa19613

Geal merged commit 2c8531f into dev Jun 7, 2024
14 checks passed

Geal deleted the geal/schema_hash branch June 7, 2024 13:56

lrlna mentioned this pull request Jun 18, 2024

prep release: v1.49.0 #5473

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

do not hash the entire schema on every query plan cache lookup #5374

do not hash the entire schema on every query plan cache lookup #5374

Geal commented Jun 7, 2024 •

edited

Loading

This comment has been minimized.

router-perf bot commented Jun 7, 2024

abernix left a comment •

edited

Loading

SimonSapin left a comment

Geal commented Jun 7, 2024

Geal commented Jun 7, 2024

IvanGoncharov Jun 7, 2024 •

edited

Loading

Geal Jun 7, 2024

abernix commented Jun 7, 2024

Geal commented Jun 7, 2024

xuorig commented Jun 7, 2024

do not hash the entire schema on every query plan cache lookup #5374

do not hash the entire schema on every query plan cache lookup #5374

Conversation

Geal commented Jun 7, 2024 • edited Loading

Footnotes

This comment has been minimized.

router-perf bot commented Jun 7, 2024

abernix left a comment • edited Loading

Choose a reason for hiding this comment

SimonSapin left a comment

Choose a reason for hiding this comment

Geal commented Jun 7, 2024

Geal commented Jun 7, 2024

IvanGoncharov Jun 7, 2024 • edited Loading

Choose a reason for hiding this comment

Geal Jun 7, 2024

Choose a reason for hiding this comment

abernix commented Jun 7, 2024

Geal commented Jun 7, 2024

xuorig commented Jun 7, 2024

Geal commented Jun 7, 2024 •

edited

Loading

abernix left a comment •

edited

Loading

IvanGoncharov Jun 7, 2024 •

edited

Loading