Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate seastar memory profiling into RP #10562

Merged

Conversation

StephanDollberg
Copy link
Member

RP side of integrating the seastar memory sampling. This will allow us to get a sampled view on the live memory set as consumed by RP.

Issue https://github.com/redpanda-data/core-internal/issues/470 / RFC

Seastar PR is here redpanda-data/seastar#46 (with numbers & flamegraphs)

Individual commits should stand on their own. I haven't added the cmake commit to the updated seastar yet so the build will obviously fail.

There is a couple open questions from my side:

  • Do we really want the periodic printing? Especially with high shard counts it gets quite noisy.
  • If yes, do we want a separate logger?
  • Do we want the sampling rate to be configurable? IMO either we chose one that works or we just turn it off. I can't imagine many scenarios where X would be fine for one case and Y for the other.
  • Do we want it default off/on at this point?

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.1.x
  • v22.3.x
  • v22.2.x

Release Notes

  • Adds a Redpanda internal memory profiler helping with finding the root cause for OOM situations

Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks awesome. just a few minor comments

src/v/redpanda/memory_sampling.cc Outdated Show resolved Hide resolved
src/v/redpanda/memory_sampling.cc Outdated Show resolved Hide resolved
src/v/redpanda/memory_sampling.cc Outdated Show resolved Hide resolved
src/v/redpanda/memory_sampling.cc Outdated Show resolved Hide resolved
src/v/redpanda/memory_sampling.cc Outdated Show resolved Hide resolved
src/v/redpanda/memory_sampling.cc Outdated Show resolved Hide resolved
src/v/redpanda/memory_sampling.h Outdated Show resolved Hide resolved
src/v/redpanda/memory_sampling.cc Outdated Show resolved Hide resolved
src/v/redpanda/admin_server.cc Outdated Show resolved Hide resolved
tests/rptest/tests/memory_sampling_test.py Show resolved Hide resolved
@StephanDollberg
Copy link
Member Author

@BenPope Have updated the fmt usage. Hopefully more idiomatic now.

@StephanDollberg
Copy link
Member Author

Decided yesterday with the team that we only want to log once at some points of memory usage. Hence, I have now added a change to only log once when we have 20% and 10% available memory left (per shard).

@StephanDollberg
Copy link
Member Author

@ballard26 I have made the response now a bit more JSONesque as discussed

} catch (const boost::bad_lexical_cast&) {
throw ss::httpd::bad_param_exception(
fmt::format("Invalid parameter 'shard_id' value {{{}}}", e));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, so much boilerplate but yeah I guess this is how we roll elsewhere in the file 🤷 .

Copy link
Member

@travisdowns travisdowns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it to the end, some comments & suggested changes.

src/v/resource_mgmt/memory_sampling.h Outdated Show resolved Hide resolved
src/v/resource_mgmt/memory_sampling.h Outdated Show resolved Hide resolved
src/v/resource_mgmt/memory_sampling.h Outdated Show resolved Hide resolved
tests/rptest/tests/memory_sampling_test.py Show resolved Hide resolved
src/v/resource_mgmt/memory_sampling.cc Show resolved Hide resolved
src/v/resource_mgmt/memory_sampling.cc Show resolved Hide resolved
Comment on lines +310 to +311
if (_memory_sampling_service.local_is_initialized()) {
_memory_sampling_service.local().notify_of_reclaim();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker at all for this PR, but just wanted to mention:

The batch cache is the only subsystem that has a registered reclaimer right now. Ideally we may register a pseudo-reclaimer at the top level (e.g. in resource_mgmt or some other convenient location) and have that call out to sub-systems that are capable of reclaiming, such as the batch cache, and in your case, receiving an upcall. That would reduce coupling a lot (e..g not having to thread the sampling service down into the storage sub-system).

src/v/resource_mgmt/memory_sampling.cc Outdated Show resolved Hide resolved
_feature_table.stop().get();
}
model::offset _base_offset{0};
ss::logger _test_logger{"foreign-test-logger"};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the memory sampling service showing up in these apparently unrelated tests?

Does the fixture somehow require it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly, all these tests use the storage::api or log_manager which own the batch cache.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit unfortunate: additional complexity for all these additional services and the tests that use them. I guess this was because of my suggestion to have the reclaim process trigger the LWM logger, right?

Probably something like @dotnwat 's suggestion of a reclaim API which could decouple these two would be a good way forward: you could register a reclaim listener which will be notified when reclaim happens without every reclaimer having to know about every listener.

I don't think we need to do this in this series though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this was because of my suggestion to have the reclaim process trigger the LWM logger, right?

Yeah, one of the reasons why I did the timer based thing before was because I wanted to avoid all these changes. Though that's just being a bit lazy really and the notify based one certainly seems better. The reclaim API suggestion would certainly help.

Copy link
Member

@travisdowns travisdowns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only remaining issue regarding the size/count fields we expose.

@dotnwat
Copy link
Member

dotnwat commented May 30, 2023

ping @StephanDollberg looks like some merge conflicts have cropped up

@StephanDollberg
Copy link
Member Author

ping @StephanDollberg looks like some merge conflicts have cropped up

Thanks yeah, I have resolved those locally (it's only small stuff). I will do a rebase plus squash once Travis has given the go-ahead.

travisdowns
travisdowns previously approved these changes May 31, 2023
Copy link
Member

@travisdowns travisdowns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the changes. This looks good to go.

@dotnwat
Copy link
Member

dotnwat commented Jun 5, 2023

@StephanDollberg looks like some merge conflicts here

@StephanDollberg
Copy link
Member Author

Squashed and rebased

@StephanDollberg StephanDollberg force-pushed the support-mem-profiling branch 2 times, most recently from 0e55d36 to a025936 Compare June 10, 2023 08:57
@StephanDollberg
Copy link
Member Author

StephanDollberg commented Jun 10, 2023

Above just minor test fixups. RP + seastar changes build is green now (minus some flaky ducktape tests and the public build which will need the sha update) - https://buildkite.com/redpanda/vtools/builds/8044#_

travisdowns
travisdowns previously approved these changes Jun 12, 2023
Copy link
Member

@travisdowns travisdowns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though the ss side changes need to go in first, presumably

@StephanDollberg StephanDollberg force-pushed the support-mem-profiling branch from a025936 to 6b4fd65 Compare June 14, 2023 09:30
@StephanDollberg StephanDollberg requested a review from a team as a code owner June 14, 2023 09:30
@StephanDollberg StephanDollberg requested review from gousteris and removed request for a team June 14, 2023 09:30
@StephanDollberg
Copy link
Member Author

Rebased and update oss seastar pointer

Adds a new memory_sampling service.

The service enables seastar sampled memory profiling.

It hooks into the batch cache which calls back into the service whenever
a reclaim is run. The service then prints top-n allocation sites once we
reach the thresholds of 10% and 20% of available memory left.
Print top-10 allocation sites on OOM.

Uses the seastar on-OOM hook to add additional output when we fail to
alloc.

The output looks similar to what we print when reaching the low
watermarks.
Adds an endpoint that the collects the current sampled memory profile
from all shards and returns them to the caller.

For now the stack is just stringified but we could make this a proper
json structure all the way down. Keeping it simple for now.

The following (or some form thereof) can be used to get a flamegraph:

```
curl localhost:9644/v1/debug/sampled_memory_profile?shard=3 \
    | jq -r .[0].profile \
    | ./symbolize_mem_dump.py /home/stephan/build/redpanda/vbuild/release/clang/bin/redpanda \
    | flamegraph.pl - > flamegraph.svg
```
Adds a config flag to enable/disable the memory sampling.

Flag can be changed without a restart. However, turning it on after the
fact is not that useful because we will be missing lots of live
allocations.

It's still useful if profiling needs to be turned off for whatever
reason.
Match vtools seastar pointer and pull in the memory sampling changes.
@StephanDollberg StephanDollberg force-pushed the support-mem-profiling branch from 6b4fd65 to ec26919 Compare June 14, 2023 21:54
Copy link
Member

@travisdowns travisdowns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-approved after rebase to fix conflicts.

@piyushredpanda piyushredpanda merged commit 01ac116 into redpanda-data:dev Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants