perf(gatsby): perf problem for match-page search in large sites #19691
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This was found while profiling a site of 25k pages.
The particular search in this diff started with
pages.length == 25599
andmatchPathPages.length == 2
but becausematchPathPages
was being pushed onto inside the loop, the array grew and grew. Ultimately the innerfind
was calling its callback 72 million (!) times, causing massive build delays (this was taking 2/3rds of the total build time for this site).In contrast, this
forEach
took 390s (seconds!) before and with this diff it takes 0.47s on my machine.This problem is caused by inefficient routing that is sometimes necessary for proper 404 handling. This becomes a big problem as the site grows and the match-path.json grows. There are some user-land improvements we can suggest to work around other problems, but at least this one we can resolve.
Thanks to @eads for offering a real-world site in #19512 so we could profile this, and @pieh for helping with the debugging.
Note that this won't affect most sites and smaller sites probably won't notice it either way. But big sites with a similar setup might be impacted positively :)