-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: File Search poor performance with lots of files because of index workaround #38243
Comments
|
Thanks. To be clear, the only mentions we have there to this specific change are here where @icewind1991 states "should help mysql to not pick the worst index to use", and then here where the user says that using mtime as order by is a bad idea, though his examples show a query running without a specific index taking over 1h, and after creating an index taking 3 seconds. Could it be that after #29300 Nextcloud now includes an equivalent index, making this workaround obsolete? It's not quite clear why we wouldn't want the query to use the mtime index alone in the first place, and in the original thread there was nobody really testing and/or commenting on that specific PR. To be clear, I do not want to suggest that it didn't help at the time - just observing that in this specific scenario it doesn't. Not sure how unique or widespread the scenario is, though - I don't have any specific customization or setup that would suggest things to be different from most other installations. Thanks for taking the time. |
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as resolved.
This comment was marked as resolved.
I would really love to test it on my instances. Is there any patch which I could use for testing? |
Well, if you're comfortable with tweaking files, this is what I'm doing on my end - assuming you don't want to run this straight on MySQL. Navigate to the Nextcloud installation folder.
Find the comment mentioning mysql, and replace the following code:
with a false condition ("and false"):
Try it out. You can revert the change by copying the old file over the edited one.
|
@pjft thanks for these detailed steps. I'm just on my mobile right now, thus can do only some basic testing. But after applying this change (adding "and false"), search time definitely increased. Before this change search took around 45 sec. After this change I cancelled the search after a couple of minutes. I assume this is related to the number of shares which makes the search query quite complex. |
Thank you. Actually, in that case, if that ends up being confirmed then that ends up justifying the change that was done to not use that index. Do let me know how the testing goes and, if it does indeed increase the time, then I'll close this - though from an engineering perspective, it still annoys me that there is a better way to run the query in (my?) particular case but for some reason it isn't generalized enough. I'll have to explore the reasons for it. |
Now I captured some slow query logs (psql). After some more testing, the patched setup (#38243 (comment)) is a bit faster compared to the default setup. Not sure why initial testing on my mobile showed a different picture (maybe some caching in background?). slow_query_nc26.log Whereas this difference isn't really noticeable while waiting for the search result. I assume it's out of question that waiting more than a minute is not acceptable. Unfortunately it's getting slower with every picture which is shared within talk (half year ago search took "only" 30 seconds). |
Sorry to hear that. Indeed, and unexpectedly, I did not observe a significant performance change in the mobile app either. Analyzing the mysql logs for a mobile query seems to show a lot more queries around it, so it might just be latency introduced by other reasons. Thanks for testing, though, and checking that it's slightly faster. |
Just to prevent some misunderstanding: While testing on mobile, I was still using the browser (firefox) and not the mobile app. |
For anyone still following this bug, I can confirm that, with near 1M files for a single user, performance is still extremely poor with the current code, and with these changes it is still an order of magnitude faster. The (admittedly little) feedback here with the change suggests that, even though for others it doesn't improve things that much, it's still better than the current performance. I can always solve it for myself by applying this patch on every new release, so that's not what I'm commenting about, but if there's interest in this being submitted as a PR I'd be happy to do so. If there's anyone else who can test this in production/live environments that would also be helpful because I am very much cognizant of the fact that different setups and infrastructure might influence these results - and it might even be that this change makes performance worse in some, which is what we want to avoid. Let me know how I can help. |
Same issue. #38243 (comment) helped. |
Thank you for testing and validating that it helps. If anybody else tests this please share your feedback! |
A large problem with this is that whether or not #32074 makes things faster or slower is dependent on the specifics of your instance. With #32074 we instead (try to) make mariadb use an index to do (part of) the "files you have access to" filtering and then sort the resulting set. This is faster if the "files you have access to" set is a small portion of the filecache but slower if it is large portion. My current preference goes towards optimizing for the "you only have access to a small part" because it seems to me that that is the increasingly likely scenario (A "many user" instance with 100M files seem more likely than a "few user" instance with 100M files"). |
Thanks @icewind1991 . That all makes sense, and I appreciate the response. I'd imagine that you'd have metrics indeed on whether the "files you have access to" is a small portion or a large portion of the filecache, based on the breath of your userbase. I'd imagine, though, that with collaboration scenarios, it is indeed common to have a "many user" instance with 100M files, but still them having access to a large portion of common files. I do not have data on this, of course, but I would imagine that other than people who have a business of providing hosting to users on top of a Nextcloud instance (where, indeed, users will not be sharing files), for an "open source content collaboration platform for tens of millions of users at thousands of organizations across the globe" (per the Nextcloud site), the scenario would be closer to several people having access to a moderate portion of the filecache, especially as when you're searching for something, you are normally searching for something you know you have access to. Nonetheless, I understand the reasoning and lacking actual data to understand what's the more common scenario, I'll defer to you as we don't seem to be able to solve for both scenarios in a way that's elegant, from an engineering perspective. |
Unfortunately I do not have actual data to back my claims, just some anecdotal evidence of cases where the index workaround gave a significant improvement I'm not really happy with the current situation but also don't believe the just removing the index workaround is the way forward. |
Of course - sounds fair. Thanks for engaging. I'd imagine we could need to change the data model somehow to accommodate for that, and I'm not as familiar with it. However, if the challenge is with the permissions, would running a query over a subquery that only gets the files that the user has permissions for make it faster? Would an index over permissions change anything here? Just throwing some ideas out there that I'm confident you and the team already went through, but could help me also think about it from a different perspective. Thanks and have a great weekend! |
@pjft I'm wondering whether this issue has been improved with latest NC30? Do you still need to apply your patches? |
@XueSheng-GIT thanks for pinging. I am still on 29.0.8 - haven't updated to NC30 yet, though I suppose I will at some point. I have been applying the changes, though honestly it's become a part of my upgrade script than anything else. When the next NC29 release comes, I'll see if I test it before applying the changes and see if there's any change in performance. Have you been observing any difference? |
Search in general feels a bit snappier on my NC30 testing instance. But I haven't done any specific tests (measurements) for comparison. This impression could also be related to the frontend itself. |
@XueSheng-GIT just to say that I recently updated to the latest 29.0.9 and I still need to add that change. |
Bug description
When you search for a file in a NextCloud instance where the user has a lot of them (800k+), performance is atrocious. Might be related to having multiple storages, though not sure of it.
Alas, for some reason, this can be traced back to a deliberate decision that isn't clear to me though it must have a really good reason for it, hence me wanting to raise the question to make sure that the reason behind it still holds.
I'm showing the results of the current query that Nextcloud runs (and the MySQL explanation), and then the same for the changed query I put together. For the sake of intellectual correctness, I ran all queries after restarting the MySQL service, so that there would be no cache at the time.
Fixed Query: 0.428 secs
Current Query in Nextcloud: 7.401 secs (it adds a +'0' to the order by condition and MySQL doesn't use the mtime index for it).
Explanation for the fixed query: uses mtime index key
Explanation for the current query: uses fs_storage_path_prefix
Also, for context, total number of files in instance (there's only a single user): 862k.
Now, the thing is that there seems to be a very deliberate effort to force MySQL to use the index with fs_storage_path_prefix instead of mtime . However, for whatever reason, in this specific context, it just makes the search query very slow. I have multiple storage points (5, I think) and lots of files. The order by clause is what causes the slow-down - removing the order by clause works as well, but I imagine there is an intention (for the sake of usability) to show the most recent files first. The mtime index helps, but I struggle to understand why we wouldn't want to use it and in what scenarios does the other index perform better.
This change was introduced one year ago in #29300 and then #32074 by @icewind1991 . I'd love to learn more so that I can understand whether it makes sense for me to submit a pull request to revert it (as maybe the underlying context might have changed since and that workaround might no longer be necessary?), or should I just revert the change on my end alone (which I have done and seems to work as intended so far).
This issue seems to be similar to #35776 and others related to search performance, but it clearly seems there are many others in the actual team who are more knowledgeable about the topic, so would love to hear their guidance.
Regards, and keep up the great work.
Steps to reproduce
Expected behavior
File search results to return shortly.
Installation method
Community Manual installation with Archive
Nextcloud Server version
25
Operating system
Debian/Ubuntu
PHP engine version
PHP 8.1
Web server
Apache (supported)
Database engine version
MariaDB
Is this bug present after an update or on a fresh install?
Updated to a major version (ex. 22.2.3 to 23.0.1)
Are you using the Nextcloud Server Encryption module?
None
What user-backends are you using?
Configuration report
List of activated Apps
Nextcloud Signing status
Nextcloud Logs
No response
Additional info
No response
The text was updated successfully, but these errors were encountered: