-
-
Notifications
You must be signed in to change notification settings - Fork 541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor Memory Management with Search Indexes (Searchables) #8854
Comments
As an aside, I added the index logging for debugging so I could see the progress when the search index was being generated and really appreciated it. By indexing separately it's possible to log when a collection inside of an index has been successfully added to the index, which is really nice visual feedback (and gives you an idea of how far along the indexing is). |
Thank you for this detailed explanation. 👍 |
@electroflame do the changes in #9072, available since 4.37.0 help your use case at all? |
@ryanmitchell Let me check. I'll try to get into this next week since I'll have to play around with the Meilisearch addon to verify. Thanks for what you've done so far though! |
Hey @ryanmitchell, I took a look and the code looks great, but when trying to adapt it with the Meilisearch addon I'm getting poor results (which might be related to me doing something wrong). Meilisearch doesn't handle updating the index the same as the Algolia addon, so it's not a one-to-one conversion. Anything special I should be doing? |
@electroflame this block of code needs updated: https://github.com/statamic-rad-pack/meilisearch/blob/26160977f22003d193076850c63b20acda0142ff/src/Meilisearch/Index.php#L97-L107 It should use the lazy() approach, similar to: |
After a bit of chat on Discord it seems the effect of this has been minimal. The next obvious step would be to allow the searchable documents to be returned lazily. |
@electroflame I've opened a PR here to provide the entries and terms lazily. Do you want to pull in the changes using a composer patch and see if that gets you close to the performance you were seeing through your reflection method? |
@ryanmitchell I'll give it a shot soon -- sometime this week probably. Thanks for working on this! |
@electroflame I've added a PR to meilisearch too, to make things easier for you: statamic-rad-pack/meilisearch#31 |
Thanks for the hard work @ryanmitchell! I think this might've solved the memory usage issue. This basically reduces the memory usage down to a negligible amount, and it seems to stay right around there the whole time (i.e. no steadily increasing memory creep like before). This is great -- a huge deal considering where memory usage started. The downside is that it absolutely tanks performance (i.e. about 45-55 minutes to update search indexes when it usually takes less than two minutes). That seems to be down to the difference in how the indexing is handled, though. My reflection method essentially chunks it per-collection (provider, etc.) so, in the case of Meilisearch, single documents aren't being sent. What you've got right now is peak memory efficiency (which rocks!), but you lose speed (at least with Meilisearch, although I'd imagine any search index where you have to send data is going to bottleneck in a similar fashion). I'd imagine most installations won't be bothered too much, but in my (admittedly extreme) case, the speed penalty is pretty brutal. As an aside, it might be useful to expose the providers and their collections via a public method (getProviders() and then getCollection, etc.). It wouldn't necessarily help with memory for this, but it would allow me (and others like me) to get at the underlying collections and process them without resorting to reflection (so it'd be safer long-term). |
Ok - I've updated the chunk size to 100 and I've reconfigured the meilisesarch PR to insert in chunks instead of 1 by 1 statamic-rad-pack/meilisearch#31 let me know if thats any better for you. |
Just wanted to update that this PR and the linked one for Meilisearch work great. After a bit of Discord-debugging @ryanmitchell was able to track down the few lingering issues I found and they're included in the now-merged PRs. The upshot is that the speed of this is roughly similar to my reflection method, but the memory usage is even lower (which is even lower than the default). After everything's said and done I'm seeing memory usage at about 10% of what default used to be, and about 40% of my reflection method. It can't be understated -- this is some great work by Ryan that should help out a lot. Thanks again, Ryan! |
Bug description
When you have a large site with many entries (50k+), updating search indexes can pretty easily lead to running out of memory.
This is largely due to the use of
searchables()->all()
, which seems to load all entries in an index into memory, before releasing that memory. Initially, I broke collections up into smaller collections (which helps with the Stache) but the search index will always fail as loading all of the searchables for the index (i.e. all collections tied to that index) before letting the garbage collector at it is a poor use of memory.I'm using the Meilisearch addon, which also makes use of this call, but I've modified it slightly to fix this issue.
Changing this:
To this:
With updateProviderIndex looking like this:
As you can probably tell, PHP is not my forte. I made use of Reflection to get at the underlying data structures and update them one-by-one instead of loading everything into memory. This is a very wrong and ugly way to do this (I didn't want to edit core files, so I was forced to work from the outside), but it does work, and it fixes the issue. I'm not sure how the actual fix should be handled, but doing something like this ensures that the memory usage should only go as high as the largest collection you have (so, if needed, you can split collections into smaller ones and add them to the same search index).
How to reproduce
search:update
Logs
No response
Environment
Installation
Fresh statamic/statamic site via CLI
Antlers Parser
runtime (new)
Additional details
No response
The text was updated successfully, but these errors were encountered: