Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi path fast filters #22574

Merged
merged 5 commits into from
Mar 27, 2020
Merged

Multi path fast filters #22574

merged 5 commits into from
Mar 27, 2020

Conversation

pvdz
Copy link
Contributor

@pvdz pvdz commented Mar 25, 2020

This will support the fast filter cache for filters with multiple paths (all leading to eq).

Before this commit, the fast filter would only apply to "flat" filters leading to eq. That means filter: {a: {b: {c: {eq: 5}}}} would be fast but filter: {a: {b: {c: {eq: 5}, d: {eq: 6}}}} (and all varieties) would be slow, take the sift path.

After this commit, any number of filters should take the fast(er) path. It'll not be as fast as a flat path because it needs to do an intersection of all filter conditions (of course), but should still be faster than going through sift.

@pvdz pvdz requested a review from a team as a code owner March 25, 2020 20:20
@pvdz pvdz requested a review from freiksenet March 25, 2020 20:20
Copy link
Contributor

@freiksenet freiksenet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great and very nicely implemented.

// mechanism does not create a Set unless there's a Node for it
return [...nodesByKeyValue]
// Put smallest last (we'll pop it)
caches.sort((a, b) => b.length - a.length)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should get buckets for filters return pre-sorted caches? Maybe not.

Copy link
Contributor Author

@pvdz pvdz Mar 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible but these lists shouldn't be big. Generally, the number of filters per query is not that big so this sort should be super-duper fast

// Put smallest last (we'll pop it)
caches.sort((a, b) => b.length - a.length)
// Iterate on the set with the fewest elements and create the intersection
const needles = caches.pop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nifty!

@pvdz
Copy link
Contributor Author

pvdz commented Mar 27, 2020

Going to run a benchmark to get some numbers and then I'll merge. The Cloud tests failed on some unrelated trunk breakage and I don't expect them to be broken by this.

@pvdz
Copy link
Contributor Author

pvdz commented Mar 27, 2020

Ran some benchmarks;

  • benchmarks/markdown_id:

    • 10k pages; old: 131q/s, new: 128 q/s. Total runtime, from 137s to 139s
    • 50k pages; old: 125 q/s, new: 124 q/s
  • ifsc (flat filters):

    • 150k pages; old 2019 q/s, new: 2040 q/s.
  • govbook:

    • 26k pages; old: 1371 q/s, new: 1378 q/s.
  • On an internal contentful site, which definitely uses multi-filter queries:

    • 2k pages (many nodes because contentful); old: 21 q/s, after: 20 q/s, total runtime from 209s to 213s
    • 14k pages (1M nodes); old: 17 q/s, after: 19 q/s. Total runtime: from 1526s to 1449s.

This seems to a consistent small regression, except for the site I was actually trying to improve. So I guess that's good :)

@pvdz
Copy link
Contributor Author

pvdz commented Mar 27, 2020

Merging since it seems to have the desired effect at scale, even if it's a small regression for non-scale. This will unblock further improvements in this area.

@pvdz pvdz merged commit 3889105 into master Mar 27, 2020
@delete-merged-branch delete-merged-branch bot deleted the multi-path-fast-filters branch March 27, 2020 20:11
@pvdz
Copy link
Contributor Author

pvdz commented Mar 30, 2020

Published in gatsby@2.20.9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants