Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: change max threads to max io requests for fuse read data #8270

Closed
wants to merge 18 commits into from

Conversation

BohuTANG
Copy link
Member

@BohuTANG BohuTANG commented Oct 18, 2022

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

For object storage, the more read requests, the faster until the network bandwidth is reached. We have applied this mechanism to read snapshot&segment files(see #8153).

This PR tries to fast-read block files with more FuseEngineSource source pipes to the pipeline:
change the pipe numbers from max_threads to max_storage_io_requests

Performance test:

Table

mysql> select * from fuse_snapshot('db7861', 't7861') limit 1\G;

*************************** 1. row ***************************
         snapshot_id: 074b5e9a528b4543ada3697d5abb0b44
   snapshot_location: 8/8074/_ss/074b5e9a528b4543ada3697d5abb0b44_v1.json
      format_version: 1
previous_snapshot_id: 8884c4cdb4904dcea0ff66dce61a664a
       segment_count: 1545888
         block_count: 1583761
           row_count: 15335500000
  bytes_uncompressed: 19065197766050
    bytes_compressed: 6001882777921
          index_size: 11814144308
           timestamp: 2022-10-17 03:47:31.377732
1 row in set (13.73 sec)
Read 1 rows, 229.00 B in 13.725 sec., 0.07285775055708306 rows/sec., 16.68 B/sec.

select sum(c8) from t7861

main branch 1 hour 15 minutes 26 MiB /sec
this PR 4 minutes 491 MiB/sec

Fixes #8263

@vercel
Copy link

vercel bot commented Oct 18, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Updated
databend ⬜️ Ignored (Inspect) Oct 19, 2022 at 1:45AM (UTC)

@mergify mergify bot added the pr-feature this PR introduces a new feature to the codebase label Oct 18, 2022
@BohuTANG BohuTANG force-pushed the dev-fast-fuse-read-block branch from 8ba4bba to cc274fe Compare October 18, 2022 01:36
@BohuTANG
Copy link
Member Author

logic test wait for: #8283

@BohuTANG
Copy link
Member Author

@mergify update

@mergify
Copy link
Contributor

mergify bot commented Oct 18, 2022

update

✅ Branch has been successfully updated

@BohuTANG BohuTANG marked this pull request as ready for review October 18, 2022 08:35
@BohuTANG BohuTANG requested a review from dantengsky October 18, 2022 08:57
@BohuTANG
Copy link
Member Author

sqllogic test failure due to the docker max open file not being set as purpose.

@BohuTANG
Copy link
Member Author

BohuTANG commented Oct 18, 2022

Because of some pipeline resize issue, I convert this PR to draft.

@dantengsky
Copy link
Member

donot know why this stateless test failed.

suite_name: base/02_function/02_0005_function_compare,

https://github.com/datafuselabs/databend/actions/runs/3274136199/jobs/5387717926#step:4:1879

@BohuTANG
Copy link
Member Author

@BohuTANG
Copy link
Member Author

@mergify update

@mergify
Copy link
Contributor

mergify bot commented Oct 19, 2022

update

✅ Branch has been successfully updated

@BohuTANG
Copy link
Member Author

Move to #8321

@BohuTANG BohuTANG closed this Oct 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

performance: try to fast IO read for FuseTableSource
4 participants