Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud_storage: cache efficiency improvements #10855

Merged

Conversation

VladLazar
Copy link
Contributor

This PR makes a couple of efficiency improvements to the cloud storage cache trimming process:

  • Stop tracking segment indices and transaction manifests in the access time tracker
  • Do not include segment indices and transaction manifests in the report generated by the cache walk.
    Note that they're still included in the reported total size for the cache,

Both hinge on the fact that segment indices and transaction manifests are cleaned up with their
respective segments or chunks, so we don't need to track them as neatly.

Fixes #10719

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.1.x
  • v22.3.x
  • v22.2.x

Release Notes

  • none

This commit changes the `access_time_tracker` used by the cloud storage
cache to ignore transaction manifest and index files. Since these file
types are removed along with their corresponding segments or chunks,
tracking them has no value.

Note that removal of these file types from the tracker is still allowed.
The tracker does not contain the paths (only hashes), so they cannot be
removed upon deserialisation.
@VladLazar VladLazar changed the title Cache efficieny improvements 10719 cloud_storage: cache efficieny improvements May 18, 2023
@@ -155,9 +163,14 @@ ss::future<> access_time_tracker::read(ss::input_stream<char>& in) {

void access_time_tracker::add_timestamp(
std::string_view key, std::chrono::system_clock::time_point ts) {
if (!should_track(key)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also check in the remove_timestamp path, to avoid doing a O(lnN) search for the things we know we never added to begin with? or does the cache just never call remove_timestamp for non-segment files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I allowed removal of tx and index keys on purpose. Since we only serialise the hash, we cannot drop all of them when de-serialising. When the tracker is hydrated, it's going to contain such keys which are only ever removed by calls to access_time_tracker::trim. The idea is that on the start-up clean we do not apply the filtering and trim away all of these entries we don't need. For subsequent clean-ups, the filtering is applied.

@VladLazar VladLazar changed the title cloud_storage: cache efficieny improvements cloud_storage: cache efficiency improvements May 18, 2023
This commit updates the code that walks the cloud storage cache to
ignore the segment indices and transaction manifests as they are ignored
by the subsequent code that performs the deletions. Note that their
sizes are still included in the running total.
@VladLazar VladLazar force-pushed the cache-efficieny-improvements-10719 branch from 9f9147e to 5526081 Compare May 19, 2023 13:26
@VladLazar VladLazar requested a review from jcsp May 19, 2023 13:27
@VladLazar VladLazar marked this pull request as ready for review May 19, 2023 16:10
@VladLazar VladLazar requested a review from Lazin May 23, 2023 08:41
@VladLazar
Copy link
Contributor Author

/ci-repeat

@@ -72,16 +79,20 @@ struct walk_accumulator {
const access_time_tracker& tracker;
bool seen_dentries{false};
std::deque<ss::sstring> dirlist;
std::optional<recursive_directory_walker::filter_type> filter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe a default value for filter like [](auto&&){ return true; } could simplify the code. it could be a default parameter for the constructor

@VladLazar
Copy link
Contributor Author

/ci-repeat

@VladLazar VladLazar requested a review from abhijat June 6, 2023 17:05
@VladLazar
Copy link
Contributor Author

/ci-repeat

/// We do not wish to track index files and transaction manifests
/// as they are just an appendage to segment/chunk files and are
/// purged along with them.
bool should_track(std::string_view key) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could be static

Comment on lines +237 to +242
auto [walked_cache_size, filtered_out_files, candidates_for_deletion, _]
= co_await _walker.walk(
_cache_dir.native(), _access_time_tracker, [](std::string_view path) {
return !(
std::string_view(path).ends_with(".tx")
|| std::string_view(path).ends_with(".index"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we're not explicitly trimming these, we should take care to make our behavior across a crash copes well. I think right now if we delete the log file and crash before we get to the ancillary files, they'll be stuck around forever. We should probably reverse the order to delete tx and index first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The start-up trim does not apply the filtering precisely for this reason. If we crash in between, we'll still attempt to clean up index and tx files on start-up.

@andijcr andijcr self-requested a review June 14, 2023 19:57
@piyushredpanda piyushredpanda merged commit f806734 into redpanda-data:dev Jun 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cloud_storage: cache efficiency low hanging fruit
5 participants