Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(gatsby): perf problem for match-page search in large sites #19691

Merged
merged 1 commit into from
Nov 22, 2019

Conversation

pvdz
Copy link
Contributor

@pvdz pvdz commented Nov 21, 2019

This was found while profiling a site of 25k pages.

The particular search in this diff started with pages.length == 25599 and matchPathPages.length == 2 but because matchPathPages was being pushed onto inside the loop, the array grew and grew. Ultimately the inner find was calling its callback 72 million (!) times, causing massive build delays (this was taking 2/3rds of the total build time for this site).

In contrast, this forEach took 390s (seconds!) before and with this diff it takes 0.47s on my machine.

This problem is caused by inefficient routing that is sometimes necessary for proper 404 handling. This becomes a big problem as the site grows and the match-path.json grows. There are some user-land improvements we can suggest to work around other problems, but at least this one we can resolve.

Thanks to @eads for offering a real-world site in #19512 so we could profile this, and @pieh for helping with the debugging.

Note that this won't affect most sites and smaller sites probably won't notice it either way. But big sites with a similar setup might be impacted positively :)

The particular search here started with `pages.length == 25599` and `matchPathPages.length == 2` but because `matchPathPages` was being pushed onto inside the loop, the array grew and grew. Ultimately the inner `find` was calling its callback 72 million (!) times, causing massive build delays (this was taking 2/3rds of the total build time for this site).

In contrast, this `forEach` took 390s (seconds!) before and now takes 0.47s on my machine.

This problem is caused by inefficient routing that is sometimes necessary for proper 404 handling. This becomes a big problem as the site grows and the match-path.json grows. There are some user-land improvements we can suggest to work around other problems, but at least this one we can resolve.

Thanks to @eads for offering a real-world site in #19512 so we could profile this, and @pieh for helping with the debugging.
@pvdz pvdz requested a review from a team as a code owner November 21, 2019 16:20
@pvdz pvdz changed the title fix(gatsby): prevent blowup for match-page search with large sites fix(gatsby): prevent perf problem for match-page search with large sites Nov 21, 2019
@pvdz pvdz changed the title fix(gatsby): prevent perf problem for match-page search with large sites fix(gatsby): perf problem for match-page search in large sites Nov 21, 2019
@pvdz pvdz changed the title fix(gatsby): perf problem for match-page search in large sites perf(gatsby): perf problem for match-page search in large sites Nov 21, 2019
Copy link
Contributor

@wardpeet wardpeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woop woop @pdvz! An easy fix for a massive problem. Looks good to me! 👍 🚤

@wardpeet wardpeet merged commit 58b89fa into master Nov 22, 2019
@wardpeet wardpeet deleted the fix-match-path-perf branch November 22, 2019 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants