Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud_storage: Rehash segment name prefixes #18762

Merged
merged 3 commits into from
Jun 5, 2024

Conversation

Lazin
Copy link
Contributor

@Lazin Lazin commented Jun 3, 2024

Currently, the cache uses segment object names as file paths. This means that we have series of nested directories to store every file. Also, the first directory is a randomized prefix. This creates very large number of nonsense directories in the root of the cache.

This PR adds new configuration option cloud_storage_cache_num_buckets. When this property is set the cache will check if the object name starts with randomized prefix (8-character hex number). If this is the case it will replace it with a different prefix that has much smaller cardinality. Basically, it will put all object names into cloud_storage_cache_num_buckets buckets.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.1.x
  • v23.3.x
  • v23.2.x

Release Notes

Features

  • Split cache into buckets using cloud_storage_cache_num_buckets configuration parameter.

@Lazin Lazin requested a review from a team as a code owner June 3, 2024 16:02
@Lazin Lazin requested review from abhijat and jcipar June 3, 2024 16:03
abhijat
abhijat previously approved these changes Jun 4, 2024
Copy link
Contributor

@abhijat abhijat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm other than some minor nits/questions

*this,
"cloud_storage_cache_num_buckets",
"Divide cloud storage cache across specified number of buckets. This "
"only works for objects with randomized prefix.",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"only works for objects with randomized prefix.",
"only works for objects with randomized prefixes.",

micheleRP
micheleRP previously approved these changes Jun 4, 2024
Copy link

@micheleRP micheleRP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please see minor doc suggestion!

Lazin added 3 commits June 4, 2024 13:05
The configuration option enables bucketing of the segment and manifest
files using different prefixes.
The names that start with randomized prefix are now a subject to
rehashing procedure. The prefix is replaced with another prefix which is
computed from the hash. This new prefix has lower cardinality.

During the lookup we will try to probe two paths. The rehashed and
original because the cache might store objects from before this feature
was enabled.
@Lazin Lazin dismissed stale reviews from micheleRP and abhijat via 46609dc June 4, 2024 17:06
@Lazin Lazin force-pushed the fix/slow-trim-problem branch from 0fa4326 to 46609dc Compare June 4, 2024 17:06
@Lazin Lazin requested review from micheleRP and abhijat June 4, 2024 17:06
@Lazin Lazin merged commit bebd391 into redpanda-data:dev Jun 5, 2024
18 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v24.1.x

@vbotbuildovich
Copy link
Collaborator

/backport v23.3.x

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v23.3.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-18762-v23.3.x-876 remotes/upstream/v23.3.x
git cherry-pick -x 42935131d75d9c0d9e05b68a0f823fb5b9d0c469 be39e81945c7169560a29dc61d04b289738437e3 46609dccb69d315f132fe6d6a99c70b96fe4936a

Workflow run logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants