Path deduplication in oc_filecache #42182
Labels
0. Needs triage
Pending check for reproducibility or if it fits our roadmap
enhancement
feature: filesystem
performance 🚀
How to use GitHub
Is your feature request related to a problem?
My small/medium Nextcloud instance (~80GB) has a database of 300MB, mainly caused (>60%) by the table
oc_filecache
.I think there is potential to reduce the database size which might also improve performance for large instances.
Describe the solution you'd like
While working on #41321, I noticed that
oc_filecache
contains the full (internal) path for each file and additionally the file name.A path deduplication in the database could decrease the table size by a lot.
This could be achieved by creating a table
oc_directories
(oroc_paths
) containing directories and mapping them to an id.This id can then be used in the
oc_filecache
instead of the raw string and the entire path can be re-created by joiningoc_filecache
andoc_directories
and combining the directory path with the file name.Then, the full path to the directory will only be in the
oc_directories
table and a directory with lots and lots of files wouldn't increase the table size by that much.Describe alternatives you've considered
basename
).parent
with thefile_id
of the parent directory. Resolving that recursively untilparent = file_id
could already replace thepath
column. However, I'm not sure how that could affect the performance for very deep directory nestings and I feel like the solution mentioned above might be a compromise.Additional context
I can help implementing this, but would appreciate a few pointers if there is something to consider.
I'm also not sure if 3rd-party apps use the filecache - if so, this change would have to be rolled out in a major release since it could break these apps.
The text was updated successfully, but these errors were encountered: