Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: ScanFiles never performed and slow with big tables #40018

Open
5 of 8 tasks
FedericoHeichou opened this issue Aug 23, 2023 · 0 comments · May be fixed by #40019
Open
5 of 8 tasks

[Bug]: ScanFiles never performed and slow with big tables #40018

FedericoHeichou opened this issue Aug 23, 2023 · 0 comments · May be fixed by #40019
Labels
0. Needs triage Pending check for reproducibility or if it fits our roadmap feature: filesystem performance 🚀 technical debt

Comments

@FedericoHeichou
Copy link
Contributor

⚠️ This issue respects the following points: ⚠️

Bug description

Given the background job ScanFiles, if a runScanner fails for a particular user, the next getUserToScan returns the same user and the code will immediately stop scanning, causing the other users not to be scanned. This can happen for an indefinite amount of time, perhaps forever.

$lastUser = '';
$user = $this->getUserToScan();
while ($user && $usersScanned < self::USERS_PER_SESSION && $lastUser !== $user) {
$this->runScanner($user);
$lastUser = $user;
$user = $this->getUserToScan();
$usersScanned += 1;
}

Also in MySQL, if the filecache table is huge (like >1 million rows) it could happen that the InnoDB chooses the wrong indexes, causing a slow query when the returned set is empty (more than 10 seconds), this is related to #24401.

Steps to reproduce

Scan not performed

To check the broken code you should try to insert a broken oc_filecache, if you are unlucky enough it will forever scan only the user with the storage 1.

insert into oc_filecache(path, size, storage) values ('/', -1, 1);

Slow queries in MySQL

This is an example with a small table:

mysql> explain analyze SELECT `user_id` FROM `oc_filecache` `f`  INNER JOIN `oc_mounts` `m` ON `storage_id` = `storage` WHERE (`size` < 0) AND (`parent` > -1) LIMIT 1 \G
*************************** 1. row ***************************
mysql> explain analyze SELECT `user_id` FROM `oc_filecache` `f`  INNER JOIN `oc_mounts` `m` ON `storage_id` = `storage` WHERE (`size` < 0) AND (`parent` > -1) LIMIT 1 \G
*************************** 1. row ***************************
EXPLAIN: -> Limit: 1 row(s)  (cost=1.52 rows=1) (actual time=0.025..0.025 rows=0 loops=1)
    -> Nested loop inner join  (cost=1.52 rows=5) (actual time=0.025..0.025 rows=0 loops=1)
        -> Filter: (f.parent > -1)  (cost=0.71 rows=1) (actual time=0.024..0.024 rows=0 loops=1)
            -> Index range scan on f using fs_size over (size < 0), with index condition: (f.size < 0)  (cost=0.71 rows=1) (actual time=0.023..0.023 rows=0 loops=1)
        -> Covering index lookup on m using mount_user_storage (storage_id=f.`storage`)  (cost=0.82 rows=5) (never executed)

1 row in set (0.00 sec)

It first filters all target files using Index range scan on f using fs_size over (size < 0), then performs a Covering index lookup on using mount_user_storage for each returned row (in this example, 1).

This is an example on a large table:

mysql> explain analyze SELECT `user_id` FROM `oc_filecache` `f`  INNER JOIN `oc_mounts` `m` ON `storage_id` = `storage` WHERE (`size` < 0) AND (`parent` > -1) LIMIT 1 \G
*************************** 1. row ***************************
EXPLAIN: -> Limit: 1 row(s)  (cost=8054 rows=1) (actual time=12982..12982 rows=0 loops=1)
    -> Nested loop inner join  (cost=8054 rows=528) (actual time=12982..12982 rows=0 loops=1)
        -> Covering index scan on m using mount_user_storage  (cost=1.3 rows=3) (actual time=0.0517..0.0937 rows=3 loops=1)
        -> Filter: ((f.size < 0) and (f.parent > -1))  (cost=2381 rows=176) (actual time=4327..4327 rows=0 loops=3)
            -> Index lookup on f using fs_storage_path_hash (storage=m.storage_id)  (cost=2381 rows=3089) (actual time=0.312..4291 rows=346344 loops=3)

1 row in set (12.98 sec)

It first gets all storages with Covering index scan on m using mount_user_storage and gets 3 storages, then for each storage it does a Index lookup on f using fs_storage_path_hash.
This happens because the engine estimates that the rows returned by Index lookup on f using fs_storage_path_hash are 3089 instead of 346344*3.

I don't think we can completely blame the engine, because if you add a record that could be returned by this query, we get :

mysql> insert into oc_filecache(path, size, storage) values ('/', -1, 1);
Query OK, 1 row affected (0.04 sec)

mysql> explain analyze SELECT storage FROM `oc_filecache` `f`, `oc_mounts` `m` use index(mount_user_storage) WHERE `size` < 0 and parent > -1 and storage_id = storage LIMIT 1 \G
*************************** 1. row ***************************
EXPLAIN: -> Limit: 1 row(s)  (cost=7095 rows=1) (actual time=0.461..0.461 rows=1 loops=1)
    -> Nested loop inner join  (cost=7095 rows=358) (actual time=0.46..0.46 rows=1 loops=1)
        -> Covering index scan on m using mount_user_storage  (cost=1.3 rows=3) (actual time=0.0201..0.0208 rows=2 loops=1)
        -> Filter: ((f.size < 0) and (f.parent > -1))  (cost=2154 rows=119) (actual time=0.218..0.218 rows=0.5 loops=2)
            -> Index lookup on f using fs_storage_path_prefix (storage=m.storage_id)  (cost=2154 rows=2150) (actual time=0.106..0.212 rows=20 loops=2)

1 row in set (0.00 sec)

It now estimates 2150 rows, whereas 20*2 are returned.

Now, if we force the index on fs_size, we get the same result for small tables:

mysql> explain analyze SELECT `user_id` FROM `oc_filecache` `f` use index (`fs_size`) INNER JOIN `oc_mounts` `m` ON `storage_id` = `storage` WHERE (`size` < 0) AND (`parent` > -1) LIMIT 1 \G
*************************** 1. row ***************************
EXPLAIN: -> Limit: 1 row(s)  (cost=119615 rows=1) (actual time=613..613 rows=0 loops=1)
    -> Nested loop inner join  (cost=119615 rows=40607) (actual time=613..613 rows=0 loops=1)
        -> Filter: (f.parent > -1)  (cost=105402 rows=40607) (actual time=60.6..536 rows=62471 loops=1)
            -> Index range scan on f using fs_size over (size < 0), with index condition: (f.size < 0)  (cost=105402 rows=121834) (actual time=60.6..528 rows=62472 loops=1)
        -> Covering index lookup on m using mount_user_storage (storage_id=f.`storage`)  (cost=0.25 rows=1) (actual time=0.00107..0.00107 rows=0 loops=62471)

1 row in set (0.62 sec)

Expected behavior

For the broken scan it should still trying all the other users.
For the slow queries it should use the right index or perform a better query.

Installation method

None

Nextcloud Server version

26

Operating system

Debian/Ubuntu

PHP engine version

PHP 8.1

Web server

None

Database engine version

Other

Is this bug present after an update or on a fresh install?

None

Are you using the Nextcloud Server Encryption module?

Encryption is Disabled

What user-backends are you using?

  • Default user-backend (database)
  • LDAP/ Active Directory
  • SSO - SAML
  • Other

Configuration report

No response

List of activated Apps

No response

Nextcloud Signing status

No response

Nextcloud Logs

No response

Additional info

No response

@FedericoHeichou FedericoHeichou added 0. Needs triage Pending check for reproducibility or if it fits our roadmap bug labels Aug 23, 2023
@FedericoHeichou FedericoHeichou linked a pull request Aug 23, 2023 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0. Needs triage Pending check for reproducibility or if it fits our roadmap feature: filesystem performance 🚀 technical debt
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants