Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage/index_state: use chunked_vector #22962

Merged
merged 1 commit into from
Aug 20, 2024

Conversation

rockwotj
Copy link
Contributor

In cases of lots of small indexes the overhead of many of these
fragmented_vectors can be quite high. Reduce the overhead by using
chunked_vector so the first chunk isn't full length.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.2.x
  • v24.1.x
  • v23.3.x

Release Notes

Improvements

  • Reduce the memory overhead of many small segments.

@rockwotj rockwotj marked this pull request as ready for review August 20, 2024 15:28
In cases of lots of small indexes the overhead of many of these
fragmented_vectors can be quite high. Reduce the overhead by using
chunked_vector so the first chunk isn't full length.
@WillemKauf
Copy link
Contributor

Nice.

There are other places in which we have fragmented_vector use that (ostensibly) scales with the number of segments, notably in disk_log_impl (example here , but feel free to search the file throughout for fragmented_vector to see the rest).

Are these sites also good candidates for a swap to chunked_vector?

Follow up question, is there clear (internal or not) messaging to Redpanda devs anywhere about how the use of chunked_vector versus fragmented_vector should be decided?

@rockwotj
Copy link
Contributor Author

There are other places in which we have fragmented_vector use that (ostensibly) scales with the number of segments

this specific change is less about the vector scaling, but the number of vectors. Those vectors you linked with scales with the number of log_impl (which is the number of partitions right?). The difference between fragmented_vector and chunked_vector is that chunked_vector grows like a normal array for the first chunk, then grows by full chunks. fragmented_vector always grows by full chunks, so small fragmented vectors have a bunch of overhead. There is more discussion here:

/**
* A vector that does not allocate large contiguous chunks. Instead the
* allocations are broken up across many different individual vectors, but the
* exposed view is of a single container.
*
* Additionally the allocation strategy is like a "normal" vector up to our max
* recommended allocation size, at which we will then only allocate new chunks
* and previous chunk elements will not be moved.
*/

Are these sites also good candidates for a swap to chunked_vector?

Generally I recommented chunked_vector is a better default data structure than fragmented vector because you don't have to make the tradeoff of overhead for small vectors and larger chunks for performance at scale.

clear (internal or not) messaging to Redpanda devs anywhere about how the use of chunked_vector versus fragmented_vector should be decided?

Again I recommend chunked_vector everywhere as the default vector type in Redpanda. Vector is only safe we if have a hard limit (with validation) that the length will not grow to our oversized allocation limit (even then you need to make sure that the vector doubling allocation strategy doesn't bite you). We have internal documentation from the perf team here: https://redpandadata.atlassian.net/wiki/spaces/CORE/pages/318275653/Memory+Management+in+Redpanda

@vbotbuildovich
Copy link
Collaborator

@piyushredpanda piyushredpanda merged commit fa32a58 into redpanda-data:dev Aug 20, 2024
17 checks passed
@piyushredpanda
Copy link
Contributor

Thank you, @rockwotj!

@vbotbuildovich
Copy link
Collaborator

/backport v24.2.x

@vbotbuildovich
Copy link
Collaborator

/backport v24.1.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants