Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate seastar memory profiling into RP #10562

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion cmake/oss.cmake.in
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ ExternalProject_Add(fmt

ExternalProject_Add(seastar
GIT_REPOSITORY https://github.com/redpanda-data/seastar.git
GIT_TAG 777ad7c4c1e280c63877b80036a5b15fd0a6388a
GIT_TAG 6e869a2068ab27ca84ffbe0fe7f7f172fdcde01c
INSTALL_DIR @REDPANDA_DEPS_INSTALL_DIR@
CMAKE_COMMAND ${CMAKE_COMMAND} -E env ${cmake_build_env} ${CMAKE_COMMAND}
LIST_SEPARATOR |
Expand Down
2 changes: 2 additions & 0 deletions src/v/cluster/tests/local_monitor_fixture.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,11 @@

#pragma once
#include "cluster/node/local_monitor.h"
#include "resource_mgmt/memory_sampling.h"
#include "storage/api.h"

#include <seastar/core/sstring.hh>
#include <seastar/util/log.hh>

#include <string_view>

Expand Down
13 changes: 13 additions & 0 deletions src/v/config/configuration.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1881,6 +1881,19 @@ configuration::configuration()
"exception is thrown instead.",
{.needs_restart = needs_restart::no, .visibility = visibility::tunable},
true)
, sampled_memory_profile(
*this,
"memory_enable_memory_sampling",
travisdowns marked this conversation as resolved.
Show resolved Hide resolved
"If true, memory allocations will be sampled and tracked. A sampled live "
"set of allocations can then be retrieved from the Admin API. "
"Additionally, we will periodically log the top-n allocation sites",
{// Enabling/Disabling this dynamically doesn't make much sense as for the
// memory profile to be meaning full you'll want to have this on from the
// beginning. However, we still provide the option to be able to disable
// it dynamically in case something goes wrong
.needs_restart = needs_restart::no,
.visibility = visibility::tunable},
true)
, enable_metrics_reporter(
*this,
"enable_metrics_reporter",
Expand Down
1 change: 1 addition & 0 deletions src/v/config/configuration.h
Original file line number Diff line number Diff line change
Expand Up @@ -375,6 +375,7 @@ struct configuration final : public config_store {

// memory related settings
property<bool> memory_abort_on_alloc_failure;
property<bool> sampled_memory_profile;

// metrics reporter
property<bool> enable_metrics_reporter;
Expand Down
10 changes: 9 additions & 1 deletion src/v/raft/tests/bootstrap_configuration_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
#include "raft/consensus_utils.h"
#include "random/generators.h"
#include "resource_mgmt/io_priority.h"
#include "resource_mgmt/memory_sampling.h"
#include "storage/api.h"
#include "storage/log.h"
#include "storage/log_manager.h"
Expand Down Expand Up @@ -49,12 +50,16 @@ struct bootstrap_fixture : raft::simple_record_fixture {
storage::with_cache::no,
storage::make_sanitized_file_config());
},
_feature_table) {
_feature_table,
_memory_sampling_service) {
_feature_table.start().get();
_feature_table
.invoke_on_all(
[](features::feature_table& f) { f.testing_activate_all(); })
.get();
_memory_sampling_service
.start(std::ref(_test_logger), config::mock_binding<bool>(false))
.get();
_storage.start().get();
// ignore the get_log()
(void)_storage.log_mgr()
Expand All @@ -81,10 +86,13 @@ struct bootstrap_fixture : raft::simple_record_fixture {

~bootstrap_fixture() {
_storage.stop().get();
_memory_sampling_service.stop().get();
_feature_table.stop().get();
}

seastar::logger _test_logger{"bootstrap-test-logger"};
ss::sharded<features::feature_table> _feature_table;
ss::sharded<memory_sampling> _memory_sampling_service;
storage::api _storage;
ss::abort_source _as;
};
Expand Down
13 changes: 11 additions & 2 deletions src/v/raft/tests/configuration_manager_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
#include "raft/logger.h"
#include "raft/types.h"
#include "random/generators.h"
#include "resource_mgmt/memory_sampling.h"
#include "storage/api.h"
#include "storage/kvstore.h"
#include "storage/log_manager.h"
Expand All @@ -25,6 +26,7 @@
#include "units.h"

#include <seastar/core/abort_source.hh>
#include <seastar/util/log.hh>

#include <boost/test/tools/old/interface.hpp>

Expand All @@ -51,7 +53,8 @@ struct config_manager_fixture {
ss::default_priority_class(),
storage::make_sanitized_file_config());
},
_feature_table))
_feature_table,
_memory_sampling_service))
, _logger(
raft::group_id(1),
model::ntp(model::ns("t"), model::topic("t"), model::partition_id(0)))
Expand All @@ -66,19 +69,25 @@ struct config_manager_fixture {
.invoke_on_all(
[](features::feature_table& f) { f.testing_activate_all(); })
.get();
_memory_sampling_service
.start(std::ref(_test_logger), config::mock_binding<bool>(false))
.get();
_storage.start().get0();
}

ss::sstring base_dir = "test_cfg_manager_"
+ random_generators::gen_alphanum_string(6);
ss::logger _test_logger{"config-mgmr-test-logger"};
ss::sharded<features::feature_table> _feature_table;
ss::sharded<memory_sampling> _memory_sampling_service;
storage::api _storage;
raft::ctx_log _logger;
raft::configuration_manager _cfg_mgr;

~config_manager_fixture() {
_feature_table.stop().get();
_storage.stop().get0();
_memory_sampling_service.stop().get();
_feature_table.stop().get();
}

raft::group_configuration random_configuration() {
Expand Down
10 changes: 9 additions & 1 deletion src/v/raft/tests/foreign_entry_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
#include "raft/types.h"
#include "random/generators.h"
#include "resource_mgmt/io_priority.h"
#include "resource_mgmt/memory_sampling.h"
#include "storage/api.h"
#include "storage/log.h"
#include "storage/log_manager.h"
Expand Down Expand Up @@ -61,12 +62,16 @@ struct foreign_entry_fixture {
ss::default_priority_class(),
storage::make_sanitized_file_config());
},
_feature_table) {
_feature_table,
_memory_sampling_service) {
_feature_table.start().get();
_feature_table
.invoke_on_all(
[](features::feature_table& f) { f.testing_activate_all(); })
.get();
_memory_sampling_service
.start(std::ref(_test_logger), config::mock_binding<bool>(false))
.get();
_storage.start().get();
(void)_storage.log_mgr()
.manage(storage::ntp_config(_ntp, "test.dir"))
Expand Down Expand Up @@ -136,10 +141,13 @@ struct foreign_entry_fixture {
}
~foreign_entry_fixture() {
_storage.stop().get();
_memory_sampling_service.stop().get();
_feature_table.stop().get();
}
model::offset _base_offset{0};
ss::logger _test_logger{"foreign-test-logger"};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the memory sampling service showing up in these apparently unrelated tests?

Does the fixture somehow require it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly, all these tests use the storage::api or log_manager which own the batch cache.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit unfortunate: additional complexity for all these additional services and the tests that use them. I guess this was because of my suggestion to have the reclaim process trigger the LWM logger, right?

Probably something like @dotnwat 's suggestion of a reclaim API which could decouple these two would be a good way forward: you could register a reclaim listener which will be notified when reclaim happens without every reclaimer having to know about every listener.

I don't think we need to do this in this series though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this was because of my suggestion to have the reclaim process trigger the LWM logger, right?

Yeah, one of the reasons why I did the timer based thing before was because I wanted to avoid all these changes. Though that's just being a bit lazy really and the notify based one certainly seems better. The reclaim API suggestion would certainly help.

ss::sharded<features::feature_table> _feature_table;
ss::sharded<memory_sampling> _memory_sampling_service;
storage::api _storage;
storage::log get_log() { return _storage.log_mgr().get(_ntp).value(); }
model::ntp _ntp{
Expand Down
17 changes: 13 additions & 4 deletions src/v/raft/tests/mux_state_machine_fixture.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
#include "raft/mux_state_machine.h"
#include "raft/types.h"
#include "random/generators.h"
#include "resource_mgmt/memory_sampling.h"
#include "rpc/connection_cache.h"
#include "storage/api.h"
#include "storage/kvstore.h"
Expand Down Expand Up @@ -57,7 +58,8 @@ struct mux_state_machine_fixture {
.start(
[kv_conf]() { return kv_conf; },
[this]() { return default_log_cfg(); },
std::ref(_feature_table))
std::ref(_feature_table),
std::ref(_memory_sampling_service))
.get0();
_storage.invoke_on_all(&storage::api::start).get0();
_as.start().get();
Expand All @@ -74,6 +76,10 @@ struct mux_state_machine_fixture {
[](features::feature_table& f) { f.testing_activate_all(); })
.get();

_memory_sampling_service
.start(std::ref(_test_logger), config::mock_binding<bool>(false))
.get();

_group_mgr
.start(
_self,
Expand Down Expand Up @@ -139,9 +145,10 @@ struct mux_state_machine_fixture {
if (_raft) {
_raft.release();
}
_connections.stop().get0();
_feature_table.stop().get0();
_storage.stop().get0();
_connections.stop().get();
_storage.stop().get();
_memory_sampling_service.stop().get();
_feature_table.stop().get();
_as.stop().get();
}
}
Expand Down Expand Up @@ -189,11 +196,13 @@ struct mux_state_machine_fixture {
model::ntp _ntp = model::ntp(
model::ns("default"), model::topic("test"), model::partition_id(0));

ss::logger _test_logger{"mux-test-logger"};
ss::sstring _data_dir;
cluster::consensus_ptr _raft;
ss::sharded<ss::abort_source> _as;
ss::sharded<rpc::connection_cache> _connections;
ss::sharded<storage::api> _storage;
ss::sharded<memory_sampling> _memory_sampling_service;
ss::sharded<features::feature_table> _feature_table;
ss::sharded<raft::group_manager> _group_mgr;
ss::sharded<raft::coordinated_recovery_throttle> _recovery_throttle;
Expand Down
10 changes: 9 additions & 1 deletion src/v/raft/tests/offset_translator_tests.cc
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
#include "model/fundamental.h"
#include "raft/offset_translator.h"
#include "random/generators.h"
#include "resource_mgmt/memory_sampling.h"
#include "storage/api.h"
#include "storage/fwd.h"
#include "storage/kvstore.h"
Expand Down Expand Up @@ -50,11 +51,15 @@ struct base_fixture {
.invoke_on_all(
[](features::feature_table& f) { f.testing_activate_all(); })
.get();
_memory_sampling_service
.start(std::ref(_test_logger), config::mock_binding<bool>(false))
.get();
_api
.start(
[this]() { return make_kv_cfg(); },
[this]() { return make_log_cfg(); },
std::ref(_feature_table))
std::ref(_feature_table),
std::ref(_memory_sampling_service))
.get();
_api.invoke_on_all(&storage::api::start).get();
}
Expand Down Expand Up @@ -87,11 +92,14 @@ struct base_fixture {
model::ntp test_ntp = model::ntp(
model::ns("test"), model::topic("tp"), model::partition_id(0));
ss::sstring _test_dir;
ss::logger _test_logger{"offset-test-logger"};
ss::sharded<features::feature_table> _feature_table;
ss::sharded<memory_sampling> _memory_sampling_service;
ss::sharded<storage::api> _api;

~base_fixture() {
_api.stop().get();
_memory_sampling_service.stop().get();
_feature_table.stop().get();
}
};
Expand Down
5 changes: 4 additions & 1 deletion src/v/raft/tests/raft_group_fixture.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
#include "raft/rpc_client_protocol.h"
#include "raft/service.h"
#include "random/generators.h"
#include "resource_mgmt/memory_sampling.h"
#include "rpc/backoff_policy.h"
#include "rpc/connection_cache.h"
#include "rpc/rpc_server.h"
Expand Down Expand Up @@ -130,7 +131,8 @@ struct raft_node {
ss::default_priority_class(),
storage::make_sanitized_file_config());
},
std::ref(feature_table))
std::ref(feature_table),
std::ref(memory_sampling_service))
.get();
storage.invoke_on_all(&storage::api::start).get();

Expand Down Expand Up @@ -362,6 +364,7 @@ struct raft_node {
consensus_ptr consensus;
std::unique_ptr<raft::log_eviction_stm> _nop_stm;
ss::sharded<features::feature_table> feature_table;
ss::sharded<memory_sampling> memory_sampling_service;
ss::abort_source _as;
};

Expand Down
60 changes: 60 additions & 0 deletions src/v/redpanda/admin/api-doc/debug.json
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,32 @@
}
]
},
{
"path": "/v1/debug/sampled_memory_profile",
travisdowns marked this conversation as resolved.
Show resolved Hide resolved
"operations": [
{
"method": "GET",
"summary": "Get the currently sampled live memory set for the specified or all shards",
"nickname": "sampled_memory_profile",
"produces": [
"application/json"
],
"type": "array",
"items": {
"type": "memory_profile"
},
"parameters": [
{
"name": "shard",
travisdowns marked this conversation as resolved.
Show resolved Hide resolved
"in": "query",
"required": false,
"allowMultiple": false,
"type": "long"
}
]
}
]
},
{
"path": "/v1/debug/refresh_disk_health_info",
"operations": [
Expand Down Expand Up @@ -497,6 +523,40 @@
}
}
},
"memory_profile": {
"id": "memory_profile",
"description": "Sampled memory profile of a shard",
"properties": {
"shard": {
"type": "long",
"description": "Id of the shard the profile is from"
},
"allocation_sites": {
"type": "array",
"items": {
"type": "allocation_site"
}
}
}
},
"allocation_site": {
"id": "allocation_site",
"description": "A single allocation site with backtrace, size and count",
"properties": {
"size": {
"type": "long",
"description": "Current bytes allocated at this allocation site (note this is the upscaled size and not the sampled one)"
},
"count": {
"type": "long",
"description": "Live allocations at this site"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think think it is confusing that "size" and "count" have a totally different basis: upscaled and not. I would not expect this and I'd expect size / count to give the average (often "only") allocation size but it won't.

If we want to provide upscaled values (I think it is quite nice to), I suggest we provided it for both.

size is also perhaps a bit unclear?

Maybe bytes_sampled, bytes_upscaled, count_sampled, count_upscaled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. However, I don't really see how one could upscale the count? We are not sampling based on it.

Overall this is a bit an artefact of the seastar implementation. I guess the proper names would be bytes_upscaled and samples_taken. I haven't made any use of the count value yet so we could probably just drop it all completely from the debug API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the count is really useful because the average size of the allocation is useful info.

I had assumed "count" gets scaled up in a similar way as size: by the inverse of the probability that any given allocation is sampled - but I see the problem: it's that different allocation sizes may be mixed at the same site?

Still doesn't upscaled_size / sampled_size * sampled_count give a meaningful number?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's that different allocation sizes may be mixed at the same site

Yeah exactly.

Still doesn't upscaled_size / sampled_size * sampled_count give a meaningful number?

It breaks if the allocations are very diverse, for example:

Sample Rate: 16k (lets assume we sample every 16k for simplicity)
Actual allocations: 16 * 1k (one gets sampled), 1 * 16k (gets sampled)

Math would then work out as:

Upscaled size: 32k
Sampled size: 17k
Sampled count: 2

=> Upscaled count: 32k / 17k * 2 ~ 3.7 vs. 17 actual allocations

Real avg. allocation size: 32k / 17 = 1.8k

I will think a bit more over the weekend but I can't really come up with anything meaningful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was thinking about this a bit more but can't come up with something that breaks with a basic example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I wasn't thinking this through carefully. I feel like we should track the min & max size for sampled allocations, which would help cover this hold but this can wait for another change.

},
"backtrace": {
"type": "string",
"description": "Backtrace of this allocation site"
}
}
},
"controller_status": {
"id": "controller_status",
"description": "Controller status",
Expand Down
Loading