Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: File Search poor performance with lots of files because of index workaround #38243

Open
6 of 9 tasks
pjft opened this issue May 14, 2023 · 22 comments
Open
6 of 9 tasks
Labels

Comments

@pjft
Copy link
Contributor

pjft commented May 14, 2023

⚠️ This issue respects the following points: ⚠️

  • This is a bug, not a question or a configuration/webserver/proxy issue.
  • This issue is not already reported on Github (I've searched it).
  • Nextcloud Server is up to date. See Maintenance and Release Schedule for supported versions.
  • Nextcloud Server is running on 64bit capable CPU, PHP and OS.
  • I agree to follow Nextcloud's Code of Conduct.

Bug description

When you search for a file in a NextCloud instance where the user has a lot of them (800k+), performance is atrocious. Might be related to having multiple storages, though not sure of it.

Alas, for some reason, this can be traced back to a deliberate decision that isn't clear to me though it must have a really good reason for it, hence me wanting to raise the question to make sure that the reason behind it still holds.

I'm showing the results of the current query that Nextcloud runs (and the MySQL explanation), and then the same for the changed query I put together. For the sake of intellectual correctness, I ran all queries after restarting the MySQL service, so that there would be no cache at the time.

Fixed Query: 0.428 secs

MariaDB [nextcloud]>  SELECT `file`.`fileid`, `storage`, `path`, `path_hash`, `file`.`parent`, `file`.`name`, `mimetype`, `mimepart`, `size`, `mtime`, `storage_mtime`, `encrypted`, `etag`, `permissions`, `checksum`, `unencrypted_size` FROM `oc_filecache` `file` WHERE (`file`.`name` COLLATE utf8mb4_general_ci LIKE '%mario%') AND (((`storage` = 38) AND ((`path` = 'files') OR (`path` LIKE 'files/%'))) OR (`storage` = 30) OR (`storage` = 31) OR (`storage` = 32) OR (`storage` = 35) OR (`storage` = 36)) ORDER BY `mtime` desc LIMIT 6;

<...>

6 rows in set (0.428 sec)

Current Query in Nextcloud: 7.401 secs (it adds a +'0' to the order by condition and MySQL doesn't use the mtime index for it).

MariaDB [nextcloud]>  SELECT `file`.`fileid`, `storage`, `path`, `path_hash`, `file`.`parent`, `file`.`name`, `mimetype`, `mimepart`, `size`, `mtime`, `storage_mtime`, `encrypted`, `etag`, `permissions`, `checksum`, `unencrypted_size` FROM `oc_filecache` `file` WHERE (`file`.`name` COLLATE utf8mb4_general_ci LIKE '%mario%') AND (((`storage` = 38) AND ((`path` = 'files') OR (`path` LIKE 'files/%'))) OR (`storage` = 30) OR (`storage` = 31) OR (`storage` = 32) OR (`storage` = 35) OR (`storage` = 36)) ORDER BY `mtime` +'0' desc LIMIT 6;

<...>

6 rows in set (7.401 sec)

Explanation for the fixed query: uses mtime index key

MariaDB [nextcloud]> explain SELECT `file`.`fileid`, `storage`, `path`, `path_hash`, `file`.`parent`, `file`.`name`, `mimetype`, `mimepart`, `size`, `mtime`, `storage_mtime`, `encrypted`, `etag`, `permissions`, `checksum`, `unencrypted_size` FROM `oc_filecache` `file` WHERE (`file`.`name` COLLATE utf8mb4_general_ci LIKE '%mario%') AND (((`storage` = 38) AND ((`path` = 'files') OR (`path` LIKE 'files/%'))) OR (`storage` = 30) OR (`storage` = 31) OR (`storage` = 32) OR (`storage` = 35) OR (`storage` = 36)) ORDER BY `mtime` desc LIMIT 6;
+------+-------------+-------+-------+-----------------------------------------------------------------------------------------------------+----------+---------+------+------+-------------+
| id   | select_type | table | type  | possible_keys                                                                                       | key      | key_len | ref  | rows | Extra       |
+------+-------------+-------+-------+-----------------------------------------------------------------------------------------------------+----------+---------+------+------+-------------+
|    1 | SIMPLE      | file  | index | fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size,fs_storage_path_prefix | fs_mtime | 8       | NULL |    6 | Using where |
+------+-------------+-------+-------+-----------------------------------------------------------------------------------------------------+----------+---------+------+------+-------------+
1 row in set (0.004 sec)

Explanation for the current query: uses fs_storage_path_prefix

MariaDB [nextcloud]> explain SELECT `file`.`fileid`, `storage`, `path`, `path_hash`, `file`.`parent`, `file`.`name`, `mimetype`, `mimepart`, `size`, `mtime`, `storage_mtime`, `encrypted`, `etag`, `permissions`, `checksum`, `unencrypted_size` FROM `oc_filecache` `file` WHERE (`file`.`name` COLLATE utf8mb4_general_ci LIKE '%mario%') AND (((`storage` = 38) AND ((`path` = 'files') OR (`path` LIKE 'files/%'))) OR (`storage` = 30) OR (`storage` = 31) OR (`storage` = 32) OR (`storage` = 35) OR (`storage` = 36)) ORDER BY `mtime` +'0' desc LIMIT 6;
+------+-------------+-------+-------+-----------------------------------------------------------------------------------------------------+------------------------+---------+------+--------+----------------------------------------------------+
| id   | select_type | table | type  | possible_keys                                                                                       | key                    | key_len | ref  | rows   | Extra                                              |
+------+-------------+-------+-------+-----------------------------------------------------------------------------------------------------+------------------------+---------+------+--------+----------------------------------------------------+
|    1 | SIMPLE      | file  | range | fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size,fs_storage_path_prefix | fs_storage_path_prefix | 267     | NULL | 814718 | Using index condition; Using where; Using filesort |
+------+-------------+-------+-------+-----------------------------------------------------------------------------------------------------+------------------------+---------+------+--------+----------------------------------------------------+
1 row in set (0.002 sec)

Also, for context, total number of files in instance (there's only a single user): 862k.

MariaDB [nextcloud]> select count(fileid) from oc_filecache;
+---------------+
| count(fileid) |
+---------------+
|        862541 |
+---------------+
1 row in set (0.477 sec)

Now, the thing is that there seems to be a very deliberate effort to force MySQL to use the index with fs_storage_path_prefix instead of mtime . However, for whatever reason, in this specific context, it just makes the search query very slow. I have multiple storage points (5, I think) and lots of files. The order by clause is what causes the slow-down - removing the order by clause works as well, but I imagine there is an intention (for the sake of usability) to show the most recent files first. The mtime index helps, but I struggle to understand why we wouldn't want to use it and in what scenarios does the other index perform better.

This change was introduced one year ago in #29300 and then #32074 by @icewind1991 . I'd love to learn more so that I can understand whether it makes sense for me to submit a pull request to revert it (as maybe the underlying context might have changed since and that workaround might no longer be necessary?), or should I just revert the change on my end alone (which I have done and seems to work as intended so far).

This issue seems to be similar to #35776 and others related to search performance, but it clearly seems there are many others in the actual team who are more knowledgeable about the topic, so would love to hear their guidance.

Regards, and keep up the great work.

Steps to reproduce

  1. Search for a file on the file search box, on a large repository.
  2. Results take at time over 10 seconds to come back.

Expected behavior

File search results to return shortly.

Installation method

Community Manual installation with Archive

Nextcloud Server version

25

Operating system

Debian/Ubuntu

PHP engine version

PHP 8.1

Web server

Apache (supported)

Database engine version

MariaDB

Is this bug present after an update or on a fresh install?

Updated to a major version (ex. 22.2.3 to 23.0.1)

Are you using the Nextcloud Server Encryption module?

None

What user-backends are you using?

  • Default user-backend (database)
  • LDAP/ Active Directory
  • SSO - SAML
  • Other

Configuration report

{
    "system": {
        "instanceid": "***REMOVED SENSITIVE VALUE***",
        "passwordsalt": "***REMOVED SENSITIVE VALUE***",
        "secret": "***REMOVED SENSITIVE VALUE***",
        "trusted_domains": [
            "<redacted>"
        ],
        "datadirectory": "***REMOVED SENSITIVE VALUE***",
        "dbtype": "mysql",
        "version": "25.0.3.2",
        "overwrite.cli.url": "<redacted>",
        "dbname": "***REMOVED SENSITIVE VALUE***",
        "dbhost": "***REMOVED SENSITIVE VALUE***",
        "dbport": "",
        "dbtableprefix": "oc_",
        "default_phone_region": "pt",
        "defaultapp": "files",
        "mysql.utf8mb4": true,
        "dbuser": "***REMOVED SENSITIVE VALUE***",
        "dbpassword": "***REMOVED SENSITIVE VALUE***",
        "installed": true,
        "twofactor_enforced": "true",
        "twofactor_enforced_groups": [],
        "twofactor_enforced_excluded_groups": [],
        "mail_smtpmode": "sendmail",
        "mail_smtpauthtype": "LOGIN",
        "mail_sendmailmode": "pipe",
        "mail_smtpsecure": "ssl",
        "mail_from_address": "***REMOVED SENSITIVE VALUE***",
        "mail_domain": "***REMOVED SENSITIVE VALUE***",
        "mail_smtpauth": 1,
        "mail_smtphost": "***REMOVED SENSITIVE VALUE***",
        "mail_smtpport": "587",
        "mail_smtpname": "***REMOVED SENSITIVE VALUE***",
        "mail_smtppassword": "***REMOVED SENSITIVE VALUE***",
        "maintenance": false,
        "memcache.local": "\\OC\\Memcache\\Redis",
        "memcache.locking": "\\OC\\Memcache\\Redis",
        "redis": {
            "host": "***REMOVED SENSITIVE VALUE***",
            "port": 6379
        },
        "theme": "",
        "loglevel": 2,
        "updater.release.channel": "stable"
    }
}

List of activated Apps

Enabled:
  - activity: 2.17.0
  - bruteforcesettings: 2.5.0
  - calendar: 4.3.3
  - circles: 25.0.0
  - cloud_federation_api: 1.8.0
  - comments: 1.15.0
  - contacts: 5.2.0
  - contactsinteraction: 1.6.0
  - dashboard: 7.5.0
  - dav: 1.24.0
  - federatedfilesharing: 1.15.0
  - federation: 1.15.0
  - files: 1.20.1
  - files_external: 1.17.0
  - files_pdfviewer: 2.6.0
  - files_rightclick: 1.4.0
  - files_sharing: 1.17.0
  - files_trashbin: 1.15.0
  - files_versions: 1.18.0
  - firstrunwizard: 2.14.0
  - logreader: 2.10.0
  - lookup_server_connector: 1.13.0
  - nextcloud_announcements: 1.14.0
  - notifications: 2.13.1
  - oauth2: 1.13.0
  - onlyoffice: 7.8.0
  - password_policy: 1.15.0
  - photos: 2.0.1
  - previewgenerator: 5.2.4
  - privacy: 1.9.0
  - provisioning_api: 1.15.0
  - related_resources: 1.0.4
  - serverinfo: 1.15.0
  - settings: 1.7.0
  - sharebymail: 1.15.0
  - support: 1.8.0
  - survey_client: 1.13.0
  - systemtags: 1.15.0
  - tasks: 0.14.5
  - text: 3.6.0
  - theming: 2.0.1
  - twofactor_backupcodes: 1.14.0
  - twofactor_nextcloud_notification: 3.6.0
  - twofactor_totp: 7.0.0
  - updatenotification: 1.15.0
  - user_status: 1.5.0
  - viewer: 1.9.0
  - weather_status: 1.5.0
  - workflow_script: 1.10.0
  - workflowengine: 2.7.0
Disabled:
  - admin_audit
  - encryption
  - extract: 1.3.5
  - files_accesscontrol: 1.15.1
  - music: 1.8.3
  - printer: 0.0.3
  - recommendations: 0.6.0
  - suspicious_login
  - twofactor_gateway: 0.20.0
  - user_ldap

Nextcloud Signing status

No errors have been found.

Nextcloud Logs

No response

Additional info

No response

@pjft pjft added 0. Needs triage Pending check for reproducibility or if it fits our roadmap bug labels May 14, 2023
@kesselb
Copy link
Contributor

kesselb commented May 15, 2023

I'd love to learn more so that I can understand whether it makes sense for me to submit a pull request to revert it (as maybe the underlying context might have changed since and that workaround might no longer be necessary?)

#23835

@pjft
Copy link
Contributor Author

pjft commented May 15, 2023

Thanks.

To be clear, the only mentions we have there to this specific change are here where @icewind1991 states "should help mysql to not pick the worst index to use", and then here where the user says that using mtime as order by is a bad idea, though his examples show a query running without a specific index taking over 1h, and after creating an index taking 3 seconds.

Could it be that after #29300 Nextcloud now includes an equivalent index, making this workaround obsolete? It's not quite clear why we wouldn't want the query to use the mtime index alone in the first place, and in the original thread there was nobody really testing and/or commenting on that specific PR.

To be clear, I do not want to suggest that it didn't help at the time - just observing that in this specific scenario it doesn't. Not sure how unique or widespread the scenario is, though - I don't have any specific customization or setup that would suggest things to be different from most other installations.

Thanks for taking the time.

@XueSheng-GIT

This comment was marked as duplicate.

@pjft

This comment was marked as resolved.

@XueSheng-GIT
Copy link

I would really love to test it on my instances. Is there any patch which I could use for testing?

@pjft
Copy link
Contributor Author

pjft commented May 18, 2023

Well, if you're comfortable with tweaking files, this is what I'm doing on my end - assuming you don't want to run this straight on MySQL.

Navigate to the Nextcloud installation folder.
Then backup and edit the file:

sudo cp  lib/private/Files/Cache/SearchBuilder.php  lib/private/Files/Cache/SearchBuilder.php.bak
sudo nano  lib/private/Files/Cache/SearchBuilder.php

Find the comment mentioning mysql, and replace the following code:

if ($field === 'mtime') {
                                $field = $query->func()->add($field, $query->createNamedParameter(0));
                        }

with a false condition ("and false"):

if ($field === 'mtime' and false) {
                                $field = $query->func()->add($field, $query->createNamedParameter(0));
}

Try it out.

You can revert the change by copying the old file over the edited one.

sudo cp lib/private/Files/Cache/SearchBuilder.php.bak lib/private/Files/Cache/SearchBuilder.php

@XueSheng-GIT
Copy link

@pjft thanks for these detailed steps. I'm just on my mobile right now, thus can do only some basic testing. But after applying this change (adding "and false"), search time definitely increased. Before this change search took around 45 sec. After this change I cancelled the search after a couple of minutes. I assume this is related to the number of shares which makes the search query quite complex.
I'll provide further details (logs) once I get access to my Desktop.

@pjft
Copy link
Contributor Author

pjft commented May 18, 2023

Thank you. Actually, in that case, if that ends up being confirmed then that ends up justifying the change that was done to not use that index. Do let me know how the testing goes and, if it does indeed increase the time, then I'll close this - though from an engineering perspective, it still annoys me that there is a better way to run the query in (my?) particular case but for some reason it isn't generalized enough. I'll have to explore the reasons for it.

@XueSheng-GIT
Copy link

XueSheng-GIT commented May 18, 2023

Now I captured some slow query logs (psql). After some more testing, the patched setup (#38243 (comment)) is a bit faster compared to the default setup. Not sure why initial testing on my mobile showed a different picture (maybe some caching in background?).
Default: 74 seconds
Patched: 64 seconds

slow_query_nc26.log
slow_query_nc26patched.log

Whereas this difference isn't really noticeable while waiting for the search result. I assume it's out of question that waiting more than a minute is not acceptable. Unfortunately it's getting slower with every picture which is shared within talk (half year ago search took "only" 30 seconds).

@pjft
Copy link
Contributor Author

pjft commented May 18, 2023

Sorry to hear that.

Indeed, and unexpectedly, I did not observe a significant performance change in the mobile app either. Analyzing the mysql logs for a mobile query seems to show a lot more queries around it, so it might just be latency introduced by other reasons.

Thanks for testing, though, and checking that it's slightly faster.

@XueSheng-GIT
Copy link

Just to prevent some misunderstanding: While testing on mobile, I was still using the browser (firefox) and not the mobile app.
If there are any further ideas how to speed up the search query, I'll be happy to test!

@pjft
Copy link
Contributor Author

pjft commented Jun 28, 2023

For anyone still following this bug, I can confirm that, with near 1M files for a single user, performance is still extremely poor with the current code, and with these changes it is still an order of magnitude faster. The (admittedly little) feedback here with the change suggests that, even though for others it doesn't improve things that much, it's still better than the current performance.

I can always solve it for myself by applying this patch on every new release, so that's not what I'm commenting about, but if there's interest in this being submitted as a PR I'd be happy to do so. If there's anyone else who can test this in production/live environments that would also be helpful because I am very much cognizant of the fact that different setups and infrastructure might influence these results - and it might even be that this change makes performance worse in some, which is what we want to avoid.

Let me know how I can help.

@whiskeytangofoxy
Copy link

Same issue. #38243 (comment) helped.

@pjft
Copy link
Contributor Author

pjft commented Jul 11, 2023

Thank you for testing and validating that it helps. If anybody else tests this please share your feedback!

@icewind1991
Copy link
Member

A large problem with this is that whether or not #32074 makes things faster or slower is dependent on the specifics of your instance.
If most of the recently changed files belong to your user (or you have at least access to them), then going through the entire filecache in mtime order using an index scan and "manually" filtering out any items you don't have access to will be pretty fast.
If however you only have access to a small portion of the recent files then you end up walking through a significant portion of the filecache before you collect enough files you have access to.

With #32074 we instead (try to) make mariadb use an index to do (part of) the "files you have access to" filtering and then sort the resulting set. This is faster if the "files you have access to" set is a small portion of the filecache but slower if it is large portion.

My current preference goes towards optimizing for the "you only have access to a small part" because it seems to me that that is the increasingly likely scenario (A "many user" instance with 100M files seem more likely than a "few user" instance with 100M files").

@pjft
Copy link
Contributor Author

pjft commented Sep 21, 2023

Thanks @icewind1991 . That all makes sense, and I appreciate the response. I'd imagine that you'd have metrics indeed on whether the "files you have access to" is a small portion or a large portion of the filecache, based on the breath of your userbase.

I'd imagine, though, that with collaboration scenarios, it is indeed common to have a "many user" instance with 100M files, but still them having access to a large portion of common files. I do not have data on this, of course, but I would imagine that other than people who have a business of providing hosting to users on top of a Nextcloud instance (where, indeed, users will not be sharing files), for an "open source content collaboration platform for tens of millions of users at thousands of organizations across the globe" (per the Nextcloud site), the scenario would be closer to several people having access to a moderate portion of the filecache, especially as when you're searching for something, you are normally searching for something you know you have access to.

Nonetheless, I understand the reasoning and lacking actual data to understand what's the more common scenario, I'll defer to you as we don't seem to be able to solve for both scenarios in a way that's elegant, from an engineering perspective.

@icewind1991
Copy link
Member

Unfortunately I do not have actual data to back my claims, just some anecdotal evidence of cases where the index workaround gave a significant improvement

I'm not really happy with the current situation but also don't believe the just removing the index workaround is the way forward.

@pjft
Copy link
Contributor Author

pjft commented Sep 22, 2023

Of course - sounds fair. Thanks for engaging. I'd imagine we could need to change the data model somehow to accommodate for that, and I'm not as familiar with it.

However, if the challenge is with the permissions, would running a query over a subquery that only gets the files that the user has permissions for make it faster? Would an index over permissions change anything here? Just throwing some ideas out there that I'm confident you and the team already went through, but could help me also think about it from a different perspective.

Thanks and have a great weekend!

@XueSheng-GIT
Copy link

@pjft I'm wondering whether this issue has been improved with latest NC30? Do you still need to apply your patches?

@pjft
Copy link
Contributor Author

pjft commented Nov 5, 2024

@XueSheng-GIT thanks for pinging. I am still on 29.0.8 - haven't updated to NC30 yet, though I suppose I will at some point. I have been applying the changes, though honestly it's become a part of my upgrade script than anything else. When the next NC29 release comes, I'll see if I test it before applying the changes and see if there's any change in performance.

Have you been observing any difference?

@XueSheng-GIT
Copy link

Have you been observing any difference?

Search in general feels a bit snappier on my NC30 testing instance. But I haven't done any specific tests (measurements) for comparison. This impression could also be related to the frontend itself.
I'm going to upgrade one of my stable NC29 instances to NC30 once 30.0.2 has been released.

@pjft
Copy link
Contributor Author

pjft commented Nov 18, 2024

@XueSheng-GIT just to say that I recently updated to the latest 29.0.9 and I still need to add that change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants