Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: optimize read of small row group in parquet #13530

Merged
merged 51 commits into from
Dec 7, 2023

Conversation

zenus
Copy link
Contributor

@zenus zenus commented Nov 1, 2023

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

normally, reading a parquet file on s3 is broken down into several steps

read file meta when read_partitions
read column chunks needed
when the row group is small, it is more efficient to merge reading column chunks.
it's no new things, i just reuse the idea of block_reader of fuse egine.
I did a simple test and it is indeed profitable。

mysql> SELECT * FROM INSPECT_PARQUET('@data/ontime_200.parquet');
+----------------------------------------------+-------------+----------+----------------+-----------------+--------------------------------+----------------------------------+
| created_by                                   | num_columns | num_rows | num_row_groups | serialized_size | max_row_groups_size_compressed | max_row_groups_size_uncompressed |
+----------------------------------------------+-------------+----------+----------------+-----------------+--------------------------------+----------------------------------+
| Arrow2 - Native Rust implementation of Arrow |         109 |      199 |              1 |           28087 |                          15197 |                           107581 |
+----------------------------------------------+-------------+----------+----------------+-----------------+--------------------------------+----------------------------------+
1 row in set (0.03 sec)
Read 1 rows, 448.00 B in 0.013 sec., 76.17 rows/sec., 33.33 KiB/sec.

mysql> set global storage_io_min_bytes_for_seek=0;
Query OK, 0 rows affected (0.02 sec)

mysql> select * from @data/ontime_200.parquet limit 1\G;
*************************** 1. row ***************************
                           year: 2020
                        quarter: 4
                          month: 12
               .....
1 row in set (0.06 sec)
Read 199 rows, 141.85 KiB in 0.029 sec., 6.93 thousand rows/sec., 4.83 MiB/sec.

ERROR:
No query specified

mysql> set global storage_io_min_bytes_for_seek=100;
Query OK, 0 rows affected (0.02 sec)

mysql> select * from @data/ontime_200.parquet limit 1\G;
*************************** 1. row ***************************
                           year: 2020
                        quarter: 4
                          month: 12
     ....
1 row in set (0.06 sec)
Read 199 rows, 141.85 KiB in 0.025 sec., 7.94 thousand rows/sec., 5.53 MiB/sec.

mysql> set global use_parquet2=0;
Query OK, 0 rows affected (0.02 sec)

mysql> set global storage_io_min_bytes_for_seek=0;
Query OK, 0 rows affected (0.03 sec)

mysql> select * from @data/ontime_200.parquet limit 1\G;
*************************** 1. row ***************************
                           year: 2020
                        quarter: 4
                          month: 12
                     dayofmonth: 1
       .......
]
1 row in set (0.06 sec)
Read 199 rows, 141.85 KiB in 0.030 sec., 6.53 thousand rows/sec., 4.55 MiB/sec.

mysql> set global storage_io_min_bytes_for_seek=100;
Query OK, 0 rows affected (0.02 sec)

mysql> select * from @data/ontime_200.parquet limit 1\G;
*************************** 1. row ***************************
                           year: 2020
                        quarter: 4
                          month: 12
               ........
1 row in set (0.06 sec)
Read 199 rows, 141.85 KiB in 0.022 sec., 8.99 thousand rows/sec., 6.26 MiB/sec.

This change is Reviewable

Copy link

vercel bot commented Nov 1, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
databend ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 3, 2023 0:45am

@zenus zenus changed the title inited refactor: optimize read of small row group in parquet Nov 1, 2023
@github-actions github-actions bot added the pr-refactor this PR changes the code base without new features or bugfix label Nov 1, 2023
@zenus zenus marked this pull request as ready for review November 3, 2023 12:43
@BohuTANG BohuTANG requested a review from RinChanNOWWW November 3, 2023 13:03
@BohuTANG
Copy link
Member

BohuTANG commented Nov 3, 2023

Can we do a performance bench for this PR?

@zenus
Copy link
Contributor Author

zenus commented Nov 3, 2023

@BohuTANG not do a performance bench first , @youngsofun review it first.

@sundy-li sundy-li requested a review from youngsofun November 3, 2023 14:55
Copy link
Contributor

@RinChanNOWWW RinChanNOWWW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the implementation is wrong. We should read all the small row groups at once. Parquet2Groups processing in this PR is the same as the original Parquet2RowGroup actually.

@zenus
Copy link
Contributor Author

zenus commented Nov 6, 2023

@RinChanNOWWW ok, let me fix.

@sundy-li
Copy link
Member

sundy-li commented Nov 6, 2023

We should read all the small row groups at once

Yes, but I think it's better to unify to read_columns_data_by_merge_io, it's a reading strategy.

Comment on lines 265 to 271
let mut groups = HashMap::with_capacity(parts.len());

for (gid, p) in parts.into_iter().enumerate() {
max_compression_ratio = max_compression_ratio
.max(p.uncompressed_size() as f64 / p.compressed_size() as f64);
max_compressed_size = max_compressed_size.max(p.compressed_size());
partitions.push(Arc::new(
Box::new(ParquetPart::Parquet2RowGroup(p)) as Box<dyn PartInfo>
));
groups.insert(gid, p);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the late review.
do you mean each file as a part? not make sense to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe u can expain amore in comments and/or the pr summary. so we can understand your code better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am fixing it , the code would be push tonight.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@youngsofun please help check

@sundy-li sundy-li enabled auto-merge December 7, 2023 05:01
@sundy-li
Copy link
Member

sundy-li commented Dec 7, 2023

@zenus Thank you for your persistence and contribution

@BohuTANG BohuTANG disabled auto-merge December 7, 2023 07:35
@BohuTANG BohuTANG merged commit 526f904 into databendlabs:main Dec 7, 2023
68 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-refactor this PR changes the code base without new features or bugfix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: optimize read of small row group in parquet
5 participants