storage/index_state: use chunked_vector #22962

rockwotj · 2024-08-20T14:56:05Z

In cases of lots of small indexes the overhead of many of these
fragmented_vectors can be quite high. Reduce the overhead by using
chunked_vector so the first chunk isn't full length.

Backports Required

Release Notes

Improvements

Reduce the memory overhead of many small segments.

In cases of lots of small indexes the overhead of many of these fragmented_vectors can be quite high. Reduce the overhead by using chunked_vector so the first chunk isn't full length.

WillemKauf · 2024-08-20T19:19:30Z

Nice.

There are other places in which we have fragmented_vector use that (ostensibly) scales with the number of segments, notably in disk_log_impl (example here , but feel free to search the file throughout for fragmented_vector to see the rest).

Are these sites also good candidates for a swap to chunked_vector?

Follow up question, is there clear (internal or not) messaging to Redpanda devs anywhere about how the use of chunked_vector versus fragmented_vector should be decided?

rockwotj · 2024-08-20T19:28:17Z

There are other places in which we have fragmented_vector use that (ostensibly) scales with the number of segments

this specific change is less about the vector scaling, but the number of vectors. Those vectors you linked with scales with the number of log_impl (which is the number of partitions right?). The difference between fragmented_vector and chunked_vector is that chunked_vector grows like a normal array for the first chunk, then grows by full chunks. fragmented_vector always grows by full chunks, so small fragmented vectors have a bunch of overhead. There is more discussion here:

redpanda/src/v/container/fragmented_vector.h

Lines 564 to 572 in 10eb41e

    
           /** 
        
            * A vector that does not allocate large contiguous chunks. Instead the 
        
            * allocations are broken up across many different individual vectors, but the 
        
            * exposed view is of a single container. 
        
            * 
        
            * Additionally the allocation strategy is like a "normal" vector up to our max 
        
            * recommended allocation size, at which we will then only allocate new chunks 
        
            * and previous chunk elements will not be moved. 
        
            */

Are these sites also good candidates for a swap to chunked_vector?

Generally I recommented chunked_vector is a better default data structure than fragmented vector because you don't have to make the tradeoff of overhead for small vectors and larger chunks for performance at scale.

clear (internal or not) messaging to Redpanda devs anywhere about how the use of chunked_vector versus fragmented_vector should be decided?

Again I recommend chunked_vector everywhere as the default vector type in Redpanda. Vector is only safe we if have a hard limit (with validation) that the length will not grow to our oversized allocation limit (even then you need to make sure that the vector doubling allocation strategy doesn't bite you). We have internal documentation from the perf team here: https://redpandadata.atlassian.net/wiki/spaces/CORE/pages/318275653/Memory+Management+in+Redpanda

vbotbuildovich · 2024-08-20T22:05:06Z

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53228#01917184-c845-4f68-93cc-1d66bd49a973

piyushredpanda · 2024-08-20T22:13:04Z

Thank you, @rockwotj!

vbotbuildovich · 2024-08-20T22:13:06Z

/backport v24.2.x

vbotbuildovich · 2024-08-20T22:13:07Z

/backport v24.1.x

github-actions bot added the area/redpanda label Aug 20, 2024

rockwotj marked this pull request as ready for review August 20, 2024 15:28

storage/index_state: use chunked_vector

9121ee7

In cases of lots of small indexes the overhead of many of these fragmented_vectors can be quite high. Reduce the overhead by using chunked_vector so the first chunk isn't full length.

rockwotj force-pushed the index-state branch from 0ae40b3 to 9121ee7 Compare August 20, 2024 17:38

travisdowns approved these changes Aug 20, 2024

View reviewed changes

dotnwat approved these changes Aug 20, 2024

View reviewed changes

piyushredpanda merged commit fa32a58 into redpanda-data:dev Aug 20, 2024
17 checks passed

This was referenced Aug 20, 2024

[v24.2.x] storage/index_state: use chunked_vector #22966

Merged

[v24.1.x] storage/index_state: use chunked_vector #22967

Merged

rockwotj deleted the index-state branch August 20, 2024 22:37

travisdowns mentioned this pull request Dec 4, 2024

Failed to allocate crash #17914

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage/index_state: use chunked_vector #22962

storage/index_state: use chunked_vector #22962

rockwotj commented Aug 20, 2024

WillemKauf commented Aug 20, 2024

rockwotj commented Aug 20, 2024

vbotbuildovich commented Aug 20, 2024

piyushredpanda commented Aug 20, 2024

vbotbuildovich commented Aug 20, 2024

vbotbuildovich commented Aug 20, 2024

storage/index_state: use chunked_vector #22962

storage/index_state: use chunked_vector #22962

Conversation

rockwotj commented Aug 20, 2024

Backports Required

Release Notes

Improvements

WillemKauf commented Aug 20, 2024

rockwotj commented Aug 20, 2024

vbotbuildovich commented Aug 20, 2024

piyushredpanda commented Aug 20, 2024

vbotbuildovich commented Aug 20, 2024

vbotbuildovich commented Aug 20, 2024