-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change information stored in _topic_node_index
to avoid oversized alloc
#17350
Conversation
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46641#018e685e-435f-491b-bfd0-b1d51343add3 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/48832#018f5921-de88-487b-a388-a72304535ec4 |
} | ||
|
||
return std::nullopt; | ||
vassert(false, "not implemented"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you include a clear comment here about why this won't be called? Is it called on other types implementing this interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ballard26 this seems to have introduced a CI failure on v24.1.x: https://buildkite.com/redpanda/redpanda/builds/51764#0190c949-1a0f-42ba-a040-4e33bc85fc38
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ERROR 2024-07-19 07:45:03,854 [shard 0:main] assert - Assert failure: (/var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-0269b6a11fdecc85f-1/redpanda/redpanda/src/v/cluster/scheduling/leader_balancer_constraints.cc:59) 'false' not implemented
thanks @BenPope looks like something changed, or the assumptions weren't correct initially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I searched a little but didn't see a ticket for this. is there one @BenPope?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like all the code removal, but can you elaborate on why you decided to take this route over just changing the std::vector
to chunked_vector
?
Sure, initially I did just blindly switch it over. Then I realized I implemented So with that in mind |
ae11e01
to
b031a46
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, although I wonder what the plan for removing legacy code is. Are we waiting for a specific release when we'll be able to deprecate the greedy strategy?
_topic_node_index[topic_id][r.to.node_id].emplace_back( | ||
moved_group_info.group_id, r.to, std::move(moved_group_info.replicas)); | ||
_topic_node_index.at(topic_id).at(r.from.node_id) -= 1; | ||
_topic_node_index.at(topic_id).at(r.to.node_id) += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why this second at
is supposed to be always successful. Isn't it possible that some nodes didn't have a leader of any partition of that topic previously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should always be. even_topic_distributon_constraint::update_index
is only used internally and the reassignment is validated by the callers before its called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, we initialize it on (new) line 79. But maybe it's easier for the reader to just use operator[]
b031a46
to
516a874
Compare
516a874
to
a5be6a6
Compare
_topic_node_index[topic_id][r.to.node_id].emplace_back( | ||
moved_group_info.group_id, r.to, std::move(moved_group_info.replicas)); | ||
_topic_node_index.at(topic_id).at(r.from.node_id) -= 1; | ||
_topic_node_index.at(topic_id).at(r.to.node_id) += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, we initialize it on (new) line 79. But maybe it's easier for the reader to just use operator[]
/ci-repeat |
new failures in https://buildkite.com/redpanda/redpanda/builds/47424#018eac40-c414-4630-a9ad-be99d0604a0e:
new failures in https://buildkite.com/redpanda/redpanda/builds/48832#018f5921-de84-48a7-a7d9-1f0fb99dc2f2:
new failures in https://buildkite.com/redpanda/redpanda/builds/48832#018f5912-cc81-4f75-9f78-52a8fcdd2b12:
new failures in https://buildkite.com/redpanda/redpanda/builds/48832#018f5921-de86-4508-bf5b-6d70c6a25e71:
new failures in https://buildkite.com/redpanda/redpanda/builds/48832#018f5912-cc7f-4db1-b622-860beaa08233:
|
/ci-repeat |
Failures should be fixed by #18302 |
/backport v23.3.x |
/backport v24.1.x |
_topic_node_index
originally stored a bunch of metadata about every group a node led. However, this information was only ever used ineven_topic_distributon_constraint::recommended_reassignment
. A method which isn't used anywhere outside of unit tests.This PR changes the information stored in
_topic_node_index
to be just the count of leaders on a given node. And removes the implementation ofeven_topic_distributon_constraint::recommended_reassignment
.Fixes #17349
Backports Required
Release Notes