Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: compaction enhancements #23380

Merged
merged 5 commits into from
Sep 24, 2024

Conversation

WillemKauf
Copy link
Contributor

@WillemKauf WillemKauf commented Sep 19, 2024

PR Summary

Minor tombstone additions/tweaks:

  1. Replace internal::get_tombstone_delete_horizon() with internal::is_past_tombstone_delete_horizon() for clean-up of logic and reduced comparisons.
  2. The addition of the segment::may_have_tombstone_records() bitflag.
  3. The addition of the may_have_removable_tombstones() function.

A number of optimizations to compaction, both in general and for tombstones:

  1. Per the comment here, the behaviour of sliding window compaction is changed, such that newly added/closed segments are ignored until the current "round" of sliding window compaction cleanly compacts all segments down to the start of the log. This allows for sliding window compaction to continually produce clean segments, and avoid a situation where clean segments are not being produced due to a high ingress rate or key cardinality (this situation would prevent timely tombstone removal). _last_compaction_window_offset must reach the base offset of the first segment in the log before new segments will be considered in the compaction window.
  2. Per the comment here, segments that are cleanly compacted or have been through a round of window compaction are considered in the sliding window range. However, it would be a no-op to actually perform window compaction over these segments. Self-compaction is performed to remove tombstones on segments that may contain them, and all cleanly compacted segments are removed before sliding window compaction occurs. This allows for timely tombstone removal by avoiding the situation in which a partition which is no longer being produced to can still trigger tombstone removal.
  3. Cleanly compacted segments do not need to have their keys added to the compaction offset map, since the deduplication process considers unindexed keys to be valid records to keep. By not indexing these segments, the compaction process can use less memory and cleanly compact down to the start of the log faster.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.2.x
  • v24.1.x
  • v23.3.x

Release Notes

Improvements

  • A number of optimizations to local storage compaction.

Replace `internal::get_tombstone_delete_horizon()` with
`internal::is_past_tombstone_delete_horizon()`.

This change makes the code a little bit cleaner, and potentially reduces
the number of operations in `should_keep()` related functions, as the
current timestamp only needs to be evaluated once for every tombstone record
in a segment with a clean compaction timestamp.

Also remove `should_remove_tombstone_record()`, since its logic is now trivial.
@WillemKauf
Copy link
Contributor Author

Force push to:

  • format with clang-format-18. I really need to update my default clang-format version.

src/v/storage/segment.h Outdated Show resolved Hide resolved
src/v/storage/disk_log_impl.cc Outdated Show resolved Hide resolved
src/v/storage/disk_log_impl.cc Outdated Show resolved Hide resolved
if (s->finished_self_compaction() || !s->has_compactible_offsets(cfg)) {
if (
(s->finished_self_compaction() || !s->has_compactible_offsets(cfg))
&& !may_have_removable_tombstones(s, cfg)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very important for allowing self compaction to remove tombstones, even if the segment has already been through a round of self compaction.

if (seg->finished_self_compaction()) {
continue;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removed code here was a superfluous check that we perform in self_compact_segment() anyways. move continue logic after call by checking for result.did_compact() == false.

@@ -197,6 +197,11 @@ class disk_log_impl final : public log {
_last_compaction_window_start_offset = o;
}

const std::optional<model::offset>&
get_last_compaction_window_start_offset() const {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For testing.

src/v/storage/segment_deduplication_utils.cc Outdated Show resolved Hide resolved
src/v/storage/segment.h Outdated Show resolved Hide resolved
src/v/storage/segment_utils.cc Outdated Show resolved Hide resolved
src/v/storage/disk_log_impl.cc Outdated Show resolved Hide resolved
Comment on lines 545 to 547
for (const auto& seg : segs) {
if (internal::may_have_removable_tombstones(seg, cfg)) {
filtered_buf.emplace_back(seg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of how this means the returned range is no longer contiguous segments. Can we put the onus for checking this on callers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed behaviour of find_sliding_range() to once again return a contiguous range of segments- however, this range may include segments already marked as being cleanly compacted/having finished window compaction, and the onus is on the caller to filter them as they see fit.

Comment on lines +510 to +537
if (
_last_compaction_window_start_offset.has_value()
&& (seg->offsets().get_base_offset()
>= _last_compaction_window_start_offset.value())) {
// Force clean segment production by compacting down to the
// start of the log before considering new segments in the
// compaction window.
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems possible that segments could have been removed such that the first segment falls above _last_compaction_window_start_offset. In that case, does this always return empty and we get stuck never compacting?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should reset _last_compaction_window_start_offset at the top of this method. Then we wouldn't need to worry about resetting it at compaction time

Copy link
Contributor Author

@WillemKauf WillemKauf Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good callout. Added a new sanity check to the top of find_sliding_range() to reset the _last_compaction_window_start_offset if it is <= _segs.front()->offsets().get_base_offset().

However, the code which resets it in sliding_window_compact is still required- we want to compare idx_start_offset to the base offset of the first segment in our sliding window range (which is not necessarily the front segment in the log) as an indicator of whether or not we can reset the start offset and allow new segments into the range.

src/v/storage/disk_log_impl.cc Outdated Show resolved Hide resolved
src/v/storage/segment_deduplication_utils.cc Outdated Show resolved Hide resolved
@@ -116,6 +116,25 @@ ss::future<model::offset> build_offset_map(
cfg.asrc->check();
}
auto seg = *iter;
if (seg->index().has_clean_compact_timestamp()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just making sure, is there anywhere in code comments that describes what it means to be clean compacted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1, 2

For future changes for sliding window compaction scheduling, it is helpful
to have a flag that indicates whether a segment may contain tombstone records
or not.

Add a new `bool` field, `may_have_tombstone_records` to the `index_state`, and
mark/unmark its value during segment deduplication and data copying.

This field is `true` by default, which can lead to false-positives. A segment
is only considered to not have tombstone records after proven by de-duplication/
segment data copying in the compaction process.
Adds a helper function that indicates whether or not a segment may
have tombstones eligible for deletion.

This can return false-positives, since any segment that has not yet
been through the compaction process is assumed to potentially have
tombstones until proven otherwise.
@WillemKauf WillemKauf force-pushed the compaction_enhancements branch from 2b3857e to 7249b56 Compare September 20, 2024 00:29
//
// If there are new segments that have not been compacted, we can't make
// this claim, and compact everything again.
if (
Copy link
Contributor Author

@WillemKauf WillemKauf Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may look a bit scary to remove this chunk of code, but this is functionally performing the same "clean-compacted" check we now perform here (since, logically, is_clean_compacted is defined this way here.)

We want to leave these segments in the compaction range so that they can be self-compacted in the case that they contain tombstones, and then filter them out afterwards before proceeding to sliding window compaction.

@WillemKauf WillemKauf changed the title storage: compaction enhancements storage: compaction enhancements Sep 20, 2024
@WillemKauf WillemKauf force-pushed the compaction_enhancements branch from 7249b56 to 3ac5b64 Compare September 20, 2024 03:52
@WillemKauf WillemKauf requested a review from andrwng September 20, 2024 16:35
andrwng
andrwng previously approved these changes Sep 23, 2024
This commit does two things, which seemed easier to combine into one commit:

1. The behavior of sliding window compaction is changed, such that newly
   added/closed segments are ignored until the current "round" of sliding
   window compaction cleanly compacts all segments down to the start of the
   range. This allows for sliding window compaction to avoid a situation where
   clean segments are not being produced due to a high ingress rate or key
   cardinality (this situation would prevent timely tombstone removal).
   `_last_compaction_window_offset` must reach the base offset of the
   first segment in the currently active window before new segments will be
   considered for compaction.

2. Segments that are cleanly compacted or have been through a round of
   window compaction are considered in the sliding window range. However,
   it would be a no-op to actually perform window compaction over these
   segments. Self-compaction is performed to remove tombstones on segments
   that may contain them, and all cleanly compacted segments are removed
   before sliding window compaction occurs. This allows for timely tombstone
   removal by avoiding the situation in which a partition which is no longer
   being produced to can still trigger tombstone removal.

Both of these changes improve the rate at which tombstone removal occurs, and help
prevent clean segment/tombstone removal "starvation".
Cleanly compacted segments do not need to have their keys added to the
compaction offset map, since the deduplication process considers
unindexed keys to be valid records to keep. By not indexing these segments,
the compaction process can use less memory and cleanly compact down to the
start of the log faster.
@WillemKauf
Copy link
Contributor Author

WillemKauf commented Sep 24, 2024

Force push to:

  • Fix logic bug in self_compact_segment(), which erroneously ignored a segment's has_compactible_offsets() result if it had removable tombstones when considering its eligibility for self compaction.

https://github.com/redpanda-data/redpanda/compare/3ac5b645768b4fc74ede70977e5f7b64d744a49c..262600310ec7dde3f8735f9055a33e2a8280a390#diff-c79a67494746e1dac357cbfc721f6718275cc322e24c34dfc8f8f6cdcae546d9R742-R743

@WillemKauf WillemKauf requested a review from andrwng September 24, 2024 16:58
@WillemKauf WillemKauf merged commit d57dc7c into redpanda-data:dev Sep 24, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants