-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate seastar memory profiling into RP #10562
Integrate seastar memory profiling into RP #10562
Conversation
24e8d0d
to
bce1314
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks awesome. just a few minor comments
@BenPope Have updated the fmt usage. Hopefully more idiomatic now. |
Decided yesterday with the team that we only want to log once at some points of memory usage. Hence, I have now added a change to only log once when we have 20% and 10% available memory left (per shard). |
@ballard26 I have made the response now a bit more JSONesque as discussed |
} catch (const boost::bad_lexical_cast&) { | ||
throw ss::httpd::bad_param_exception( | ||
fmt::format("Invalid parameter 'shard_id' value {{{}}}", e)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, so much boilerplate but yeah I guess this is how we roll elsewhere in the file 🤷 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made it to the end, some comments & suggested changes.
if (_memory_sampling_service.local_is_initialized()) { | ||
_memory_sampling_service.local().notify_of_reclaim(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker at all for this PR, but just wanted to mention:
The batch cache is the only subsystem that has a registered reclaimer right now. Ideally we may register a pseudo-reclaimer at the top level (e.g. in resource_mgmt or some other convenient location) and have that call out to sub-systems that are capable of reclaiming, such as the batch cache, and in your case, receiving an upcall. That would reduce coupling a lot (e..g not having to thread the sampling service down into the storage sub-system).
_feature_table.stop().get(); | ||
} | ||
model::offset _base_offset{0}; | ||
ss::logger _test_logger{"foreign-test-logger"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the memory sampling service showing up in these apparently unrelated tests?
Does the fixture somehow require it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly, all these tests use the storage::api or log_manager which own the batch cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit unfortunate: additional complexity for all these additional services and the tests that use them. I guess this was because of my suggestion to have the reclaim process trigger the LWM logger, right?
Probably something like @dotnwat 's suggestion of a reclaim API which could decouple these two would be a good way forward: you could register a reclaim listener which will be notified when reclaim happens without every reclaimer having to know about every listener.
I don't think we need to do this in this series though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this was because of my suggestion to have the reclaim process trigger the LWM logger, right?
Yeah, one of the reasons why I did the timer based thing before was because I wanted to avoid all these changes. Though that's just being a bit lazy really and the notify based one certainly seems better. The reclaim API suggestion would certainly help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only remaining issue regarding the size/count fields we expose.
ping @StephanDollberg looks like some merge conflicts have cropped up |
Thanks yeah, I have resolved those locally (it's only small stuff). I will do a rebase plus squash once Travis has given the go-ahead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the changes. This looks good to go.
@StephanDollberg looks like some merge conflicts here |
20498db
to
a4fbd8c
Compare
Squashed and rebased |
0e55d36
to
a025936
Compare
Above just minor test fixups. RP + seastar changes build is green now (minus some flaky ducktape tests and the public build which will need the sha update) - https://buildkite.com/redpanda/vtools/builds/8044#_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though the ss side changes need to go in first, presumably
a025936
to
6b4fd65
Compare
Rebased and update oss seastar pointer |
Adds a new memory_sampling service. The service enables seastar sampled memory profiling. It hooks into the batch cache which calls back into the service whenever a reclaim is run. The service then prints top-n allocation sites once we reach the thresholds of 10% and 20% of available memory left.
Print top-10 allocation sites on OOM. Uses the seastar on-OOM hook to add additional output when we fail to alloc. The output looks similar to what we print when reaching the low watermarks.
Adds an endpoint that the collects the current sampled memory profile from all shards and returns them to the caller. For now the stack is just stringified but we could make this a proper json structure all the way down. Keeping it simple for now. The following (or some form thereof) can be used to get a flamegraph: ``` curl localhost:9644/v1/debug/sampled_memory_profile?shard=3 \ | jq -r .[0].profile \ | ./symbolize_mem_dump.py /home/stephan/build/redpanda/vbuild/release/clang/bin/redpanda \ | flamegraph.pl - > flamegraph.svg ```
Adds a config flag to enable/disable the memory sampling. Flag can be changed without a restart. However, turning it on after the fact is not that useful because we will be missing lots of live allocations. It's still useful if profiling needs to be turned off for whatever reason.
Match vtools seastar pointer and pull in the memory sampling changes.
6b4fd65
to
ec26919
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-approved after rebase to fix conflicts.
RP side of integrating the seastar memory sampling. This will allow us to get a sampled view on the live memory set as consumed by RP.
Issue https://github.com/redpanda-data/core-internal/issues/470 / RFC
Seastar PR is here redpanda-data/seastar#46 (with numbers & flamegraphs)
Individual commits should stand on their own. I haven't added the cmake commit to the updated seastar yet so the build will obviously fail.
There is a couple open questions from my side:
Backports Required
Release Notes