Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(core): add index on name #44586

Merged
merged 1 commit into from
Apr 28, 2024
Merged

perf(core): add index on name #44586

merged 1 commit into from
Apr 28, 2024

Conversation

pulsejet
Copy link
Member

@pulsejet pulsejet commented Mar 30, 2024

We already have an index on parent+name but this can't be used for filtering just by name in a user's root. Such a query is used by both Photos and Memories for filtering out nomedia files.

BEFORE: 4 rows in set (0.251 sec)
AFTER: 4 rows in set (0.001 sec)

(this is on a smallish filecache with 400k; it gets pretty slow when the filecache can't fit into memory cause it's doing a scan)

$comp = new SearchBinaryOperator(ISearchBinaryOperator::OPERATOR_OR, [
    new SearchComparison(ISearchComparison::COMPARE_EQUAL, 'name', '.nomedia'),
    new SearchComparison(ISearchComparison::COMPARE_EQUAL, 'name', '.nomemories'),
]);
$search = $root->search(new SearchQuery($comp, 0, 0, [], Util::getUser()));

Query looks like this:

SELECT 
  `file`.`fileid`, 
  `storage`, 
  `path`, 
  `path_hash`, 
  `file`.`parent`, 
  `file`.`name`, 
  `mimetype`, 
  `mimepart`, 
  `size`, 
  `mtime`, 
  `storage_mtime`, 
  `encrypted`, 
  `etag`, 
  `file`.`permissions`, 
  `checksum`, 
  `unencrypted_size`, 
  `meta`.`json` AS `meta_json`, 
  `meta`.`sync_token` AS `meta_sync_token` 
FROM 
  `oc_filecache` `file` 
  LEFT JOIN `oc_files_metadata` `meta` ON `file`.`fileid` = `meta`.`file_id` 
WHERE 
  (
    `file`.`name` IN ('.nomedia', '.nomemories')
  ) 
  AND (
    (
      (`storage` = 1) 
      AND (
        (`path` = 'files') 
        OR (`path` LIKE 'files/%')
      )
    ) 
    OR (`storage` = 4) 
    OR (
      (`storage` = 3) 
      AND (
        (`path` = 'files/STA') 
        OR (`path` LIKE 'files/STA/%')
      )
    )
  );

Query plan BEFORE:

+------+-------------+-------+--------+-----------------------------------------------------------------------------------------------------+-------------------+---------+-----------------------+--------+-------------+
| id   | select_type | table | type   | possible_keys                                                                                       | key               | key_len | ref                   | rows   | Extra       |
+------+-------------+-------+--------+-----------------------------------------------------------------------------------------------------+-------------------+---------+-----------------------+--------+-------------+
|    1 | SIMPLE      | file  | ALL    | fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size,fs_storage_path_prefix | NULL              | NULL    | NULL                  | 418989 | Using where |
|    1 | SIMPLE      | meta  | eq_ref | files_meta_fileid                                                                                   | files_meta_fileid | 8       | nextcloud.file.fileid | 1      |             |
+------+-------------+-------+--------+-----------------------------------------------------------------------------------------------------+-------------------+---------+-----------------------+--------+-------------+

AFTER:

+------+-------------+-------+--------+------------------------------------------------------------------------------------------------------------------+-------------------+---------+-----------------------+------+------------------------------------+
| id   | select_type | table | type   | possible_keys                                                                                                    | key               | key_len | ref                   | rows | Extra                              |
+------+-------------+-------+--------+------------------------------------------------------------------------------------------------------------------+-------------------+---------+-----------------------+------+------------------------------------+
|    1 | SIMPLE      | file  | range  | fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size,fs_storage_path_prefix,fs_name_hash | fs_name_hash      | 1003    | NULL                  | 4    | Using index condition; Using where |
|    1 | SIMPLE      | meta  | eq_ref | files_meta_fileid                                                                                                | files_meta_fileid | 8       | nextcloud.file.fileid | 1    |                                    |
+------+-------------+-------+--------+------------------------------------------------------------------------------------------------------------------+-------------------+---------+-----------------------+------+------------------------------------+

Note: it might possible to create a joint index that's even faster but this index might help the optimizer with other unrelated queries so I think this is better.

@pulsejet pulsejet added 3. to review Waiting for reviews php Pull requests that update Php code performance 🚀 labels Mar 30, 2024
@pulsejet pulsejet added this to the Nextcloud 30 milestone Mar 30, 2024
@pulsejet pulsejet requested review from nickvergessen and a team March 30, 2024 19:20
@pulsejet pulsejet self-assigned this Mar 30, 2024
Signed-off-by: Varun Patil <varunpatil@ucla.edu>
@pulsejet pulsejet changed the title fix(core): add index on name perf(core): add index on name Mar 30, 2024
@susnux susnux requested review from icewind1991 and Altahrim and removed request for a team March 30, 2024 23:25
Copy link
Contributor

@emoral435 emoral435 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - and this is a really useful and impactful contribution too! Thank you so much :)

@skjnldsv skjnldsv requested a review from come-nc April 2, 2024 10:32
@Altahrim
Copy link
Collaborator

Altahrim commented Apr 3, 2024

If the index is only required for this kind of query (with .nosomething), maybe it's enough to add only the first 12 characters in the index.

@pulsejet
Copy link
Member Author

pulsejet commented Apr 3, 2024

I wouldn't say it's the only potential use case for the index but doing a scan beyond 12 characters is not a big deal anyway, so I don't mind.

@susnux susnux merged commit b081d3c into master Apr 28, 2024
167 checks passed
@susnux susnux deleted the pulsejet/name-idx branch April 28, 2024 14:55
Copy link
Contributor

Hello there,
Thank you so much for taking the time and effort to create a pull request to our Nextcloud project.

We hope that the review process is going smooth and is helpful for you. We want to ensure your pull request is reviewed to your satisfaction. If you have a moment, our community management team would very much appreciate your feedback on your experience with this PR review process.

Your feedback is valuable to us as we continuously strive to improve our community developer experience. Please take a moment to complete our short survey by clicking on the following link: https://cloud.nextcloud.com/apps/forms/s/i9Ago4EQRZ7TWxjfmeEpPkf6

Thank you for contributing to Nextcloud and we hope to hear from you soon!

@kesselb
Copy link
Contributor

kesselb commented Apr 29, 2024

Hi,

image

addMissingIndex is enough to show the warning and adding the index via occ db:add-missing-indices.

For fresh installations we should consider altering an existing older migration (e.g. core/Migrations/Version13000Date20170718121200.php) to add the index automatically for fresh installations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3. to review Waiting for reviews feedback-requested performance 🚀 php Pull requests that update Php code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants