[Bug]: ScanFiles never performed and slow with big tables #40018

FedericoHeichou · 2023-08-23T14:00:40Z

⚠️ This issue respects the following points: ⚠️

This is a bug, not a question or a configuration/webserver/proxy issue.
This issue is not already reported on Github OR Nextcloud Community Forum (I've searched it).
Nextcloud Server is up to date. See Maintenance and Release Schedule for supported versions.
I agree to follow Nextcloud's Code of Conduct.

Bug description

Given the background job ScanFiles, if a runScanner fails for a particular user, the next getUserToScan returns the same user and the code will immediately stop scanning, causing the other users not to be scanned. This can happen for an indefinite amount of time, perhaps forever.

server/apps/files/lib/BackgroundJob/ScanFiles.php

Lines 110 to 117 in 590b1d9

    
           $lastUser = ''; 
        
           $user = $this->getUserToScan(); 
        
           while ($user && $usersScanned < self::USERS_PER_SESSION && $lastUser !== $user) { 
        
           	$this->runScanner($user); 
        
           	$lastUser = $user; 
        
           	$user = $this->getUserToScan(); 
        
           	$usersScanned += 1; 
        
           }

Also in MySQL, if the filecache table is huge (like >1 million rows) it could happen that the InnoDB chooses the wrong indexes, causing a slow query when the returned set is empty (more than 10 seconds), this is related to #24401.

Steps to reproduce

Scan not performed

To check the broken code you should try to insert a broken oc_filecache, if you are unlucky enough it will forever scan only the user with the storage 1.

insert into oc_filecache(path, size, storage) values ('/', -1, 1);

Slow queries in MySQL

This is an example with a small table:

mysql> explain analyze SELECT `user_id` FROM `oc_filecache` `f`  INNER JOIN `oc_mounts` `m` ON `storage_id` = `storage` WHERE (`size` < 0) AND (`parent` > -1) LIMIT 1 \G
*************************** 1. row ***************************
mysql> explain analyze SELECT `user_id` FROM `oc_filecache` `f`  INNER JOIN `oc_mounts` `m` ON `storage_id` = `storage` WHERE (`size` < 0) AND (`parent` > -1) LIMIT 1 \G
*************************** 1. row ***************************
EXPLAIN: -> Limit: 1 row(s)  (cost=1.52 rows=1) (actual time=0.025..0.025 rows=0 loops=1)
    -> Nested loop inner join  (cost=1.52 rows=5) (actual time=0.025..0.025 rows=0 loops=1)
        -> Filter: (f.parent > -1)  (cost=0.71 rows=1) (actual time=0.024..0.024 rows=0 loops=1)
            -> Index range scan on f using fs_size over (size < 0), with index condition: (f.size < 0)  (cost=0.71 rows=1) (actual time=0.023..0.023 rows=0 loops=1)
        -> Covering index lookup on m using mount_user_storage (storage_id=f.`storage`)  (cost=0.82 rows=5) (never executed)

1 row in set (0.00 sec)

It first filters all target files using Index range scan on f using fs_size over (size < 0), then performs a Covering index lookup on using mount_user_storage for each returned row (in this example, 1).

This is an example on a large table:

mysql> explain analyze SELECT `user_id` FROM `oc_filecache` `f`  INNER JOIN `oc_mounts` `m` ON `storage_id` = `storage` WHERE (`size` < 0) AND (`parent` > -1) LIMIT 1 \G
*************************** 1. row ***************************
EXPLAIN: -> Limit: 1 row(s)  (cost=8054 rows=1) (actual time=12982..12982 rows=0 loops=1)
    -> Nested loop inner join  (cost=8054 rows=528) (actual time=12982..12982 rows=0 loops=1)
        -> Covering index scan on m using mount_user_storage  (cost=1.3 rows=3) (actual time=0.0517..0.0937 rows=3 loops=1)
        -> Filter: ((f.size < 0) and (f.parent > -1))  (cost=2381 rows=176) (actual time=4327..4327 rows=0 loops=3)
            -> Index lookup on f using fs_storage_path_hash (storage=m.storage_id)  (cost=2381 rows=3089) (actual time=0.312..4291 rows=346344 loops=3)

1 row in set (12.98 sec)

It first gets all storages with Covering index scan on m using mount_user_storage and gets 3 storages, then for each storage it does a Index lookup on f using fs_storage_path_hash.
This happens because the engine estimates that the rows returned by Index lookup on f using fs_storage_path_hash are 3089 instead of 346344*3.

I don't think we can completely blame the engine, because if you add a record that could be returned by this query, we get :

mysql> insert into oc_filecache(path, size, storage) values ('/', -1, 1);
Query OK, 1 row affected (0.04 sec)

mysql> explain analyze SELECT storage FROM `oc_filecache` `f`, `oc_mounts` `m` use index(mount_user_storage) WHERE `size` < 0 and parent > -1 and storage_id = storage LIMIT 1 \G
*************************** 1. row ***************************
EXPLAIN: -> Limit: 1 row(s)  (cost=7095 rows=1) (actual time=0.461..0.461 rows=1 loops=1)
    -> Nested loop inner join  (cost=7095 rows=358) (actual time=0.46..0.46 rows=1 loops=1)
        -> Covering index scan on m using mount_user_storage  (cost=1.3 rows=3) (actual time=0.0201..0.0208 rows=2 loops=1)
        -> Filter: ((f.size < 0) and (f.parent > -1))  (cost=2154 rows=119) (actual time=0.218..0.218 rows=0.5 loops=2)
            -> Index lookup on f using fs_storage_path_prefix (storage=m.storage_id)  (cost=2154 rows=2150) (actual time=0.106..0.212 rows=20 loops=2)

1 row in set (0.00 sec)

It now estimates 2150 rows, whereas 20*2 are returned.

Now, if we force the index on fs_size, we get the same result for small tables:

mysql> explain analyze SELECT `user_id` FROM `oc_filecache` `f` use index (`fs_size`) INNER JOIN `oc_mounts` `m` ON `storage_id` = `storage` WHERE (`size` < 0) AND (`parent` > -1) LIMIT 1 \G
*************************** 1. row ***************************
EXPLAIN: -> Limit: 1 row(s)  (cost=119615 rows=1) (actual time=613..613 rows=0 loops=1)
    -> Nested loop inner join  (cost=119615 rows=40607) (actual time=613..613 rows=0 loops=1)
        -> Filter: (f.parent > -1)  (cost=105402 rows=40607) (actual time=60.6..536 rows=62471 loops=1)
            -> Index range scan on f using fs_size over (size < 0), with index condition: (f.size < 0)  (cost=105402 rows=121834) (actual time=60.6..528 rows=62472 loops=1)
        -> Covering index lookup on m using mount_user_storage (storage_id=f.`storage`)  (cost=0.25 rows=1) (actual time=0.00107..0.00107 rows=0 loops=62471)

1 row in set (0.62 sec)

Expected behavior

For the broken scan it should still trying all the other users.
For the slow queries it should use the right index or perform a better query.

Installation method

None

Nextcloud Server version

26

Operating system

Debian/Ubuntu

PHP engine version

PHP 8.1

Web server

None

Database engine version

Other

Is this bug present after an update or on a fresh install?

None

Are you using the Nextcloud Server Encryption module?

Encryption is Disabled

What user-backends are you using?

Default user-backend (database)
LDAP/ Active Directory
SSO - SAML
Other

Configuration report

No response

List of activated Apps

No response

Nextcloud Signing status

No response

Nextcloud Logs

No response

Additional info

No response

The text was updated successfully, but these errors were encountered:

FedericoHeichou added 0. Needs triage Pending check for reproducibility or if it fits our roadmap bug labels Aug 23, 2023

FedericoHeichou linked a pull request Aug 23, 2023 that will close this issue

Reducing queries in ScanFiles.php #40019

Open

5 tasks

szaimen added 26-feedback technical debt and removed bug 26-feedback labels Aug 23, 2023

joshtrichards added the feature: filesystem label Aug 23, 2023

joshtrichards added the performance 🚀 label Aug 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: ScanFiles never performed and slow with big tables #40018

[Bug]: ScanFiles never performed and slow with big tables #40018

FedericoHeichou commented Aug 23, 2023

[Bug]: ScanFiles never performed and slow with big tables #40018

[Bug]: ScanFiles never performed and slow with big tables #40018

Comments

FedericoHeichou commented Aug 23, 2023

⚠️ This issue respects the following points: ⚠️

Bug description

Steps to reproduce

Scan not performed

Slow queries in MySQL

Expected behavior

Installation method

Nextcloud Server version

Operating system

PHP engine version

Web server

Database engine version

Is this bug present after an update or on a fresh install?

Are you using the Nextcloud Server Encryption module?

What user-backends are you using?

Configuration report

List of activated Apps

Nextcloud Signing status

Nextcloud Logs

Additional info