Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chunked insertion (by size of blocks) #3147

Closed
Tracked by #1780
dantengsky opened this issue Nov 29, 2021 · 8 comments
Closed
Tracked by #1780

chunked insertion (by size of blocks) #3147

dantengsky opened this issue Nov 29, 2021 · 8 comments
Assignees
Labels
A-query Area: databend query A-storage Area: databend storage

Comments

@dantengsky
Copy link
Member

dantengsky commented Nov 29, 2021

Summary

Original:

while taking in data blocks during Table::append, chunks several small data blocks into a larger one if possible.

Updated (2022-03-30):

Although this issue has been fixed by PR #3122, later a strategy of merging small blocks into bigger ones by the number of rows is used.

It would be better if both bock_size(in bytes, before compression, imo) and number_rows are taken into account while merging small blocks into larger ones. such that the size of memory used to construct the block will no longer increase proportionally to the number of columns of the table.

current impl:

https://github.com/datafuselabs/databend/blob/50226158b19380f06d36edb759f98482027c51db/query/src/storages/fuse/io/write/block_stream_writer.rs#L278-L304

@dantengsky
Copy link
Member Author

addressed in PR #3122

@dantengsky
Copy link
Member Author

Updated (2022-03-30):

Although this issue has been fixed by PR #3122, later a strategy of merging small blocks into bigger ones by the number of rows is used.

It would be better if both bock_size(in bytes, before compression, imo) and nubmer_of_rows are taken into account while merging small blocks into larger ones. such that the size of memory used to construct the block will no longer increase proportionally to the number of columns of the table.

current impl:

https://github.com/datafuselabs/databend/blob/50226158b19380f06d36edb759f98482027c51db/query/src/storages/fuse/io/write/block_stream_writer.rs#L278-L304

@dantengsky dantengsky reopened this Mar 30, 2022
@BohuTANG BohuTANG added the A-storage Area: databend storage label Mar 30, 2022
@BohuTANG
Copy link
Member

Hello @dantengsky ,
is it related to async insert #4577?

@dantengsky
Copy link
Member Author

dantengsky commented Mar 30, 2022

Hello @dantengsky , is it related to async insert #4577?

Hope I get it right, in issue #4577, expressions (values of batched insert statements) are going to be compacted into blocks of proper size if applicable and then insert into a table by calling Table::append.

  • apparently, the config /setting of the block size could be shared.
  • It may also be worth introducing another param to Table::append, as a hint that indicates the blocks being appended are well-sized.
  • There seems to be something in common between
    • compacting expressions into blocks of the proper size
    • compacting blocks into blocks of the proper size
    but frankly speaking, I do not know how to extract them. @sundy-li please help

@sundy-li
Copy link
Member

Async Insert is used to aggregate multi-client's small insert.

Seems it's pretty different with this I think.

@BohuTANG
Copy link
Member

I got, thank you man!

@sundy-li
Copy link
Member

sundy-li commented May 7, 2022

@dantengsky We already have TransformBlockCompact which has setting max_row_per_block, min_row_per_block. maybe you can introduce more settings into this struct.

@Xuanwo Xuanwo moved this to 📋 Backlog in Databend Storage Layer May 20, 2022
@Xuanwo Xuanwo added A-storage Area: databend storage and removed A-storage Area: databend storage labels May 20, 2022
@dantengsky
Copy link
Member Author

staled

@github-project-automation github-project-automation bot moved this from 📋 Backlog to 📦 Done in Databend Storage Layer Nov 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-query Area: databend query A-storage Area: databend storage
Projects
Status: 📦 Done
Development

No branches or pull requests

4 participants