Skip to content
This repository has been archived by the owner on Sep 18, 2023. It is now read-only.

[SHUFFLE] manually split of Variable length buffer (String likely) #856

Closed
FelixYBW opened this issue Apr 16, 2022 · 0 comments
Closed

[SHUFFLE] manually split of Variable length buffer (String likely) #856

FelixYBW opened this issue Apr 16, 2022 · 0 comments
Labels
enhancement New feature or request feature

Comments

@FelixYBW
Copy link
Collaborator

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently we still use arrow's bufferbuilder to build the binary buffer, which is extreamly not efficient. Each binary builder has 3 memory reference as below. Each reducer each binary column has such a structure needs to be cached in memory.
binarybuilder
offsets_builder_
BufferBuilder
data()
value_data_builder_
BufferBuilder
data()
null_bitmap_builder_
BufferBuilder
data()

Describe the solution you'd like

Solution is to use the data() pointer directly to split the record batch into, just like fixed width columns

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request feature
Projects
None yet
Development

No branches or pull requests

2 participants