Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize query pattern used by storage filter #40555

Merged
merged 6 commits into from
Feb 16, 2024

Conversation

icewind1991
Copy link
Member

@icewind1991 icewind1991 commented Sep 21, 2023

To filter searches based on the storages a user has access to search queries often include multiple (storage = $1 and path like "$2/%") (for folder shares/groupfolders) and (storage = $1 and path = $2) (for single file shares) parts.

This attempts to optimize those queries by

  • grouping multiple storage comparisons together: (storage = $1 and $path like $2) or (storage = $1 and $path like $3) -> storage = $1 and ($path like $2 or $path like $3)
  • combining multiple = on the same field with an in: (path = $1 or path = $2 or $path = 3) -> path in ($1,$2,$3)

todo:

  • chunk large in statements

@icewind1991 icewind1991 force-pushed the query-optimize-distribute branch 2 times, most recently from e4a4a9f to 28d41e3 Compare September 21, 2023 14:54
@icewind1991
Copy link
Member Author

An example search query with a number of single file and folder shares from 2 users and a few groupfolders

From

SELECT `file`.`fileid`,
       `storage`,
       `path`,
       `path_hash`,
       `file`.`parent`,
       `file`.`name`,
       `mimetype`,
       `mimepart`,
       `size`,
       `mtime`,
       `storage_mtime`,
       `encrypted`,
       `etag`,
       `permissions`,
       `checksum`,
       `unencrypted_size`
FROM `*PREFIX*filecache` `file`
WHERE (`file`.`name` ILIKE :"%tesa%")
  AND (((`storage` = :1) AND
        ((`path` = :"files") OR (`path` LIKE :"files\/%"))) OR
       ((`storage` = :2) AND ((`path` = :"__groupfolders\/1") OR
                              (`path` LIKE :"\\_\\_groupfolders\/1\/%"))) OR
       ((`storage` = :2) AND ((`path` = :"__groupfolders\/2") OR
                              (`path` LIKE :"\\_\\_groupfolders\/2\/%"))) OR
       ((`storage` = :4) AND (`path` = :"files\/cv.md")) OR
       ((`storage` = :3) AND (`path` = :"files\/boilerplate.md")) OR
       ((`storage` = :3) AND ((`path` = :"files\/testfolder") OR
                              (`path` LIKE :"files\/testfolder\/%"))) OR
       ((`storage` = :3) AND (`path` = :"files\/oc-react.md")) OR
       ((`storage` = :3) AND (`path` = :"files\/Patcher.md")) OR
       ((`storage` = :3) AND (`path` = :"files\/storage.md")) OR
       ((`storage` = :3) AND (`path` = :"files\/welcome.txt")) OR
       ((`storage` = :3) AND (`path` = :"files\/oc-news-feed.md")) OR
       ((`storage` = :4) AND (`path` = :"files\/rssrun.md")) OR
       ((`storage` = :4) AND (`path` = :"files\/Replay.md")) OR
       ((`storage` = :4) AND (`path` = :"files\/regen.md")) OR
       ((`storage` = :3) AND ((`path` = :"files\/sharefolder2") OR
                              (`path` LIKE :"files\/sharefolder2\/%"))) OR
       ((`storage` = :3) AND ((`path` = :"files\/sharefolder3") OR
                              (`path` LIKE :"files\/sharefolder3\/%"))))
ORDER BY `mtime` + :0 desc LIMIT 5

to

SELECT `file`.`fileid`,
       `storage`,
       `path`,
       `path_hash`,
       `file`.`parent`,
       `file`.`name`,
       `mimetype`,
       `mimepart`,
       `size`,
       `mtime`,
       `storage_mtime`,
       `encrypted`,
       `etag`,
       `permissions`,
       `checksum`,
       `unencrypted_size`
FROM `*PREFIX*filecache` `file`
WHERE (`file`.`name` ILIKE :"%tes%")
  AND (((`storage` = :1) AND
        ((`path` = :"files") OR (`path` LIKE :"files\/%"))) OR
       ((`storage` = :2) AND
        ((`path` IN (:["__groupfolders\/1", "__groupfolders\/2"])) OR
         (`path` LIKE :"\\_\\_groupfolders\/1\/%") OR
         (`path` LIKE :"\\_\\_groupfolders\/2\/%"))) OR ((`storage` = :4) AND
                                                         (`path` IN
                                                          (:["files\/cv.md",
                                                           "files\/rssrun.md",
                                                           "files\/Replay.md",
                                                           "files\/regen.md"]))) OR
       ((`storage` = :3) AND ((`path` IN
                               (:["files\/boilerplate.md", "files\/testfolder",
                                "files\/oc-react.md", "files\/Patcher.md",
                                "files\/storage.md", "files\/welcome.txt",
                                "files\/oc-news-feed.md", "files\/sharefolder2",
                                "files\/sharefolder3"])) OR
                              (`path` LIKE :"files\/testfolder\/%") OR
                              (`path` LIKE :"files\/sharefolder2\/%") OR
                              (`path` LIKE :"files\/sharefolder3\/%"))))
ORDER BY `mtime` + :0 desc LIMIT 5

@icewind1991 icewind1991 force-pushed the query-optimize-distribute branch 2 times, most recently from 63c7d16 to 1e79a5f Compare September 22, 2023 11:15
@solracsf solracsf added this to the Nextcloud 28 milestone Oct 27, 2023
@skjnldsv skjnldsv mentioned this pull request Nov 1, 2023
This was referenced Nov 6, 2023
This was referenced Nov 14, 2023
@blizzz blizzz modified the milestones: Nextcloud 28, Nextcloud 29 Nov 23, 2023
@XueSheng-GIT
Copy link

@icewind1991 Any plans for an updated pull for NC28? Search for files is back to more than 60 seconds on my instance (thus, search is practically unusable).

@icewind1991 icewind1991 force-pushed the query-optimize-distribute branch from 1e79a5f to 03234a3 Compare February 5, 2024 09:17
@icewind1991 icewind1991 force-pushed the query-optimize-distribute branch 5 times, most recently from aba15a1 to 89ea173 Compare February 5, 2024 17:56
@icewind1991 icewind1991 marked this pull request as ready for review February 6, 2024 13:03
@icewind1991 icewind1991 requested review from a team and removed request for a team February 6, 2024 13:04
@icewind1991 icewind1991 force-pushed the query-optimize-distribute branch 2 times, most recently from 14b1ca1 to 18b5963 Compare February 7, 2024 09:17
Signed-off-by: Robin Appelman <robin@icewind.nl>
…ents

Signed-off-by: Robin Appelman <robin@icewind.nl>
Signed-off-by: Robin Appelman <robin@icewind.nl>
Signed-off-by: Robin Appelman <robin@icewind.nl>
Signed-off-by: Robin Appelman <robin@icewind.nl>
@icewind1991 icewind1991 force-pushed the query-optimize-distribute branch 2 times, most recently from 5cc8bb5 to fcafb9e Compare February 15, 2024 18:26
Signed-off-by: Robin Appelman <robin@icewind.nl>
@icewind1991 icewind1991 force-pushed the query-optimize-distribute branch from fcafb9e to 3890aa5 Compare February 16, 2024 09:59
@icewind1991 icewind1991 merged commit bb87232 into master Feb 16, 2024
159 checks passed
@icewind1991 icewind1991 deleted the query-optimize-distribute branch February 16, 2024 10:24
@blizzz blizzz mentioned this pull request Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants