-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloud_storage: Rehash segment name prefixes #18762
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm other than some minor nits/questions
src/v/config/configuration.cc
Outdated
*this, | ||
"cloud_storage_cache_num_buckets", | ||
"Divide cloud storage cache across specified number of buckets. This " | ||
"only works for objects with randomized prefix.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"only works for objects with randomized prefix.", | |
"only works for objects with randomized prefixes.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please see minor doc suggestion!
The configuration option enables bucketing of the segment and manifest files using different prefixes.
The names that start with randomized prefix are now a subject to rehashing procedure. The prefix is replaced with another prefix which is computed from the hash. This new prefix has lower cardinality. During the lookup we will try to probe two paths. The rehashed and original because the cache might store objects from before this feature was enabled.
0fa4326
to
46609dc
Compare
/backport v24.1.x |
/backport v23.3.x |
Failed to create a backport PR to v23.3.x branch. I tried:
|
Currently, the cache uses segment object names as file paths. This means that we have series of nested directories to store every file. Also, the first directory is a randomized prefix. This creates very large number of nonsense directories in the root of the cache.
This PR adds new configuration option
cloud_storage_cache_num_buckets
. When this property is set the cache will check if the object name starts with randomized prefix (8-character hex number). If this is the case it will replace it with a different prefix that has much smaller cardinality. Basically, it will put all object names intocloud_storage_cache_num_buckets
buckets.Backports Required
Release Notes
Features
cloud_storage_cache_num_buckets
configuration parameter.