-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: File and tag search slow if a lot of shares present #35776
Comments
We have 600k lines on oc_filecache table and file search is almost unusable, taking more than 30 seconds to show results. So, we have decided to write our own search form, outside of Nextcloud (added to the app menu using external sites app, as we are not able to make a real Nextcloud App), but using the same database (mariadb here). We've added some extra search functionalities :
Search is based on the user profile, so only the files shared to, or owned by the connected user appear on the results. Surprisingly, our search engine is significantly faster than the official nextcloud one. It takes about 2 seconds to display results. There is certainly a lot of possible optimizations to do. Feel free to comment, use, adapt this bunch of code as we are not going to make an app or whatever with it. If it may helps someone...
|
Here is a search example with terms "numérique collectivité" performed in 2 seconds. With nextcloud official unified search, these files are not retrieved, and it takes about a minute to finish... |
Thanks a lot @Mer0me for sharing your investigations and your approach to bypass the issue. Search is really unusable if the users share a lot of files. |
Thank you for your comment. We were able to write a quick and dirty search engine to address our particular needs, but I'm not sure (read : I'm sure of the opposite) this approach is scalable and good enough to be used as a Nextcloud unified search replacement. I'm glad to help the Nextcloud community to find solutions but :
But if this contribution can help to find why Nextcloud search engine is so slow, and if I can personally help to improve it, I will certainly do. |
@Mer0me I’m very happy that you have commented about current issues of search and have done something about it. Of another thread with users asking for similar things, see #29614. Agree with you on various points such as specifying search terms in any order, searching multiple fields, near instant results, etc. I think one should also be able to sort such by relevance, file name, date, etc. and have search re-sort near instantly. Perhaps you wish to look into MySQL FTS. Such offers exactly what you wish for. Plus diacritic insensitive search so there is no need to remove accents, diacritics, or manage such. I think the main reason for such not being added yet is perhaps the files portion of Nextcloud has been somewhat neglected as more features are added to increase market share, profit, and merely keeping the business open, plus likely responding to competitors features by adding similar. Perhaps too that since Nextcloud supports multiple databases, such would have to be added for all DBs: SQLite, MySQL/MariaDB, and PostgreSQL. All offer such full-text search, not to be confused with FTS plugins and searching text contents, yet the code for each is different and perhaps no one on the team has done such before, is interested, or considers such important enough. My impression is that perhaps Nextcloud has no one on their team with decent enough database experience. See ownCloud OCIS for an example of decent search implementation. It shows up in a full window, etc. It’s as if no one in the Nextcloud team uses file search on their own desktop, being able to search an entire file system with instant results, and wants the same on Nextcloud. I store 100+ GB of files and searching now really is what do I even say, for how long is it going to remain so poor? :) https://dev.mysql.com/doc/refman/8.0/en/fulltext-boolean.html |
Unified search of the last versions of Nextcloud (since 25.0 maybe) is now very quick. We always need to search terms in the same order, but the results are near instantly displayed, so it makes sense again. |
Perhaps their way to make it seem quick is merely to show the first 5 results, and then require the user to select 'Load more results' each time to see another five. Doesn't anyone have tens of thousands of files or more, and with a particular search string may potentially have dozens, hundreds, or more search results, and wants to see them all at once? Imagine Google, Amazon, your own file system, your contacts list on computer or phone, if all of them acted in the same way, showing five results, and then requiring you to select more each and everytime. Really doesn't anyone see the madness? |
I'm currently testing #37061. This does really speed up things. I've posted my webdav-search results over there. |
@icewind1991 thanks for your efforts! I did a quick test on one of my instances (updated to NC 27.1.1) and created 1,000 empty text files and shared them via talk. #40555 was not applied 100% because SearchBuilderTest.php doesn't seem to exist on 27.1.1.
#37061 is finally faster in this scenario, but #40555 is still a major improvement compared to the default installation (42 times faster). UPDATE: |
@XueSheng-GIT thanks for the testing. The resulting query isn't quite what I expected (I would expect to see Can you try applying the latest commit I pushed to the PR and see if that makes a difference |
@icewind1991 I'm just only on my mobile. Thus, a bit limited with testing. But here you go with the updated PR (I did a new run for all variants):
Seems to be a quite good speedup for the updated PR #40555 Logs to the search times above... Does the query now look like intended? |
Yes, this looks as expected. Thanks again for the testing |
@icewind1991 I did some further testing on one of my productive systems (NC27.1.1) which was always slow on search before using #37061 and wasn't able to notice any search speedup using #40555 (same patch version as used for previous test).
Logs to the search times above... Any idea why there's no speedup in this case? |
@icewind1991 Any idea why search is slow in my latest test #35776 (comment)? |
Unfortunately on NC28 the patches provided in #37061 or #40555 are not compatible anymore. Default search of NC28 is still slow if a lot of shares are present. Whereas I used #37061 in production because #40555 was still slow in some cases (see comment above #35776 (comment)) @icewind1991 @starypatyk any plans to update the query optimization for NC28? |
@XueSheng-GIT - I put my PR on hold in favor of #40555. Now, I do not know what to do next, as #40555 did not progress since September. 😞 |
Now that #40555 was merged into master, I did some further testing on NC28.0.3rc2.
Logs to the search times above... @icewind1991 Thanks a lot for taking care of this matter and merging #40555 into master!
CC @starypatyk because it was his approach. Maybe he has some additional idea why #37061 is so much faster than #40555. |
Signed-off-by: Dariusz Olszewski <starypatyk@users.noreply.github.com>
@XueSheng-GIT Thanks for your tests. 👍 Indeed the query should be optimized as you describe, but for some reason this does not happen. @icewind1991 I created a few simple tests that show the issue - please see my branch https://github.com/nextcloud/server/commits/query-optimizer-search-issue/. I am not sure, if I should create a PR from this branch, as it contains failing tests only. Feel free to use these tests, if you think they are valuable. One of the problems is shown by a pair of tests: Apparently the code in
Two additional tests mimic the query condition created in the The In the second one |
Thanks for the testing, I'll try to look into things further. |
Signed-off-by: Dariusz Olszewski <starypatyk@users.noreply.github.com>
#43975 fixes those tests |
Signed-off-by: Dariusz Olszewski <starypatyk@users.noreply.github.com>
Signed-off-by: Dariusz Olszewski <starypatyk@users.noreply.github.com>
@icewind1991 Thanks for following up! For the sake of completeness, here are the new results on NC28.0.3.
Logs to the search times above... I would say nothing to complain anymore. Seems we can close this issue once #43975 is merged. A backport to NC28 would be welcome (although I'm already used to the manual patches 😉). Thanks again @starypatyk and @icewind1991 for your great work! |
Closing since it's been merged for awhile now:
|
Bug description
Global/Unified search for files and tags is slow if a lot of shares are present. Search result for fulltextsearch, collectives, talk, deck and mails popup nearly instantly, but result for files and tags takes ages until they appear (approx. 30 seconds).
Tested accounts have approx. 600 shares (according to
oc_shares
share_with
column.Talk is used a lot for sharing photos, which is probably the main reason for the amount of shares.
Postgres log shows slow query... see below.
Doing the same on a cloned server instance without these shares (shares removed), the search results for files and tags appear within 2 seconds.
Just want to reference #23835 which really improved things in regards to search speed. Unfortunately it seems this issue is not fixed if a lot of shares are present.
Steps to reproduce
Expected behavior
Search result for files and tags should appear as fast as possible, even if a lot of shares are received.
Installation method
Community Manual installation with Archive
Operating system
Debian/Ubuntu
PHP engine version
PHP 8.1
Web server
Apache (supported)
Database engine version
PostgreSQL
Is this bug present after an update or on a fresh install?
Updated to a major version (ex. 22.2.3 to 23.0.1)
Are you using the Nextcloud Server Encryption module?
None
What user-backends are you using?
Configuration report
List of activated Apps
Nextcloud Signing status
Nextcloud Logs
No response
Additional info
Postgres log shows slow search query for files and tags:
slow_search.log
The text was updated successfully, but these errors were encountered: