Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: add greedy log storage target size control loop monitor #11702

Merged
merged 7 commits into from
Jun 28, 2023

Conversation

dotnwat
Copy link
Member

@dotnwat dotnwat commented Jun 27, 2023

Adds a log storage monitor / control loop whose goal is to enforce the target usage size of log storage.

This PR adds a greedy approach to reclaiming cloud-uploaded data by violating local retention policies in an effort to meet the target storage limit.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.1.x
  • v22.3.x
  • v22.2.x

Release Notes

  • none

@dotnwat dotnwat force-pushed the space-management-v2 branch from 6466ce0 to ffa743c Compare June 27, 2023 04:29
@dotnwat dotnwat changed the title Space management v2 storage: add basic log storage target size control loop monitor Jun 27, 2023
Copy link
Contributor

@jcsp jcsp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks sensible - I guess this is kind of the framework that the actual local log truncation will drop into

src/v/config/configuration.cc Outdated Show resolved Hide resolved
dotnwat added 6 commits June 27, 2023 23:16
This is the target size for raft storage. It is approximate in that
calculation is both delayed and not expected to be always be precise. We
do not use this value to block writes (as is done with actual low free
space of the underlying device).

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
This is used by space management control loop. The general usage is to:

1. configure partitions, for example by doing nothing or changing
   retention configuration.
2. trigger housekeeping with GC prioritized

After HK runs the GC prioritization flag is cleared, but callers can
repriotize at anytime in the future.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
When accounting for amount of data reclaimable by gc we will count bytes
that are subject to reclaim by the current local retention policy. for
cloud enabled topics we want to also estimate how much data in total is
reclaimable, and this is effectively how much data is present up to max
collectible offset.

prior to this commit we were counting this second bucket for non-cloud
enabled topics where max collectible offset is roughly the end of the
log in most cases. this led to misleading / incorrect stats about how
much data was available for reclaiming in gc process.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Space management control requires a few things:

1. Access segment information for each partition that allows it to
   reason about how much of a partition should (and can) be truncated to
   meet a target reclaim size.

2. A mechanism for overriding local retention rules for cloud enabled
   topics. Data that is uploaded into the cloud, independent of a topics
   retention settings, can be removed safely on disk. But choose a
   policy wisely.

This commit adds (1) by exposing raw segments from the log, so the
caller should be careful. The flip side is that its just expected to
look at size information and offsets.

For (2) the normal GC path is hijacked when a specific offset is
requested. In this case the normal size/time based retention is ignored
and the requested offset is used. The requested offset is cleared after
GC runs. This makes sense because after the data has been reclaimed
reads may rehydrate storage at which point space management may
reconsider the data.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
This placeholder code made subsequent commits pretty messy.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
@dotnwat dotnwat force-pushed the space-management-v2 branch from ffa743c to 44d9d86 Compare June 28, 2023 06:52
@dotnwat dotnwat requested review from jcsp and mmaslankaprv June 28, 2023 06:54
@dotnwat dotnwat marked this pull request as ready for review June 28, 2023 06:54
@dotnwat dotnwat changed the title storage: add basic log storage target size control loop monitor storage: add greedy log storage target size control loop monitor Jun 28, 2023
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
@dotnwat dotnwat force-pushed the space-management-v2 branch from 44d9d86 to 0be1c0d Compare June 28, 2023 06:55
@dotnwat
Copy link
Member Author

dotnwat commented Jun 28, 2023

Force-push:

  • Added greedy approach to reclaiming data from cloud uploaded topics

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants