Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(storage): refactor recluster #16070

Merged
merged 8 commits into from
Jul 24, 2024

Conversation

zhyass
Copy link
Member

@zhyass zhyass commented Jul 17, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

This PR introduces several enhancements and optimizations related to the reclustering process:

  1. Addition of RelOperator::Recluster:

    • Refactored the method for generating the physical plan for reclustering. Thus, the physical plan will be generated by PhysicalPlanBuilder.
  2. Optimize Compact Execution:

    • Modified the optimize compact logic to directly perform reclustering when the target table is a clustered table. This change eliminates the need for a separate compact operation followed by reclustering, thereby reducing the overall execution time and complexity.
  3. Segment-level Sorting:

    • Introduced a segment-level sorting mechanism. Even when block-level reclustering is not necessary, if there are overlapping segments, they will be sorted to create new segments. This ensures better data organization and improved query performance.
mysql> create table t(a int) cluster by(a) block_per_segment=2 row_per_block=3;
Query OK, 0 rows affected (0.06 sec)

mysql> insert into t values(0),(1),(2);
Query OK, 3 rows affected (0.08 sec)

mysql> insert into t values(6),(7),(8);
Query OK, 3 rows affected (0.07 sec)

mysql> optimize table t compact;
Query OK, 0 rows affected (0.24 sec)

mysql> insert into t values(3),(4),(5);
Query OK, 3 rows affected (0.07 sec)

mysql>  insert into t values(9),(10),(11);
Query OK, 3 rows affected (0.08 sec)

mysql> optimize table t compact segment;
Query OK, 0 rows affected (0.21 sec)

mysql> select * from clustering_information('default','t');
+-------------+----------------------------+-------------------+----------------------+------------------+---------------+-----------------------+
| cluster_key | timestamp                  | total_block_count | constant_block_count | average_overlaps | average_depth | block_depth_histogram |
+-------------+----------------------------+-------------------+----------------------+------------------+---------------+-----------------------+
| (a)         | 2024-07-18 10:35:48.426716 |                 4 |                    0 |              0.0 |           1.0 | {"00001":4}           |
+-------------+----------------------------+-------------------+----------------------+------------------+---------------+-----------------------+
1 row in set (0.05 sec)
Read 1 rows, 448.00 B in 0.025 sec., 40.34 rows/sec., 17.65 KiB/sec.

mysql> explain select * from t where a=6;
+--------------------------------------------------------------------------------------------------------------------------+
| explain                                                                                                                  |
+--------------------------------------------------------------------------------------------------------------------------+
| Filter                                                                                                                   |
| ├── output columns: [t.a (#0)]                                                                                           |
| ├── filters: [is_true(t.a (#0) = 6)]                                                                                     |
| ├── estimated rows: 1.00                                                                                                 |
| └── TableScan                                                                                                            |
|     ├── table: default.default.t                                                                                         |
|     ├── output columns: [a (#0)]                                                                                         |
|     ├── read rows: 3                                                                                                     |
|     ├── read size: < 1 KiB                                                                                               |
|     ├── partitions total: 4                                                                                              |
|     ├── partitions scanned: 1                                                                                            |
|     ├── pruning stats: [segments: <range pruning: 2 to 2>, blocks: <range pruning: 4 to 1, bloom pruning: 1 to 1>]       |
|     ├── push downs: [filters: [is_true(t.a (#0) = 6)], limit: NONE]                                                      |
|     └── estimated rows: 12.00                                                                                            |
+--------------------------------------------------------------------------------------------------------------------------+
14 rows in set (0.08 sec)
Read 0 rows, 0.00 B in 0.019 sec., 0 rows/sec., 0.00 B/sec.

mysql> optimize table t compact;
Query OK, 0 rows affected (0.21 sec)

mysql> select * from clustering_information('default','t');
+-------------+----------------------------+-------------------+----------------------+------------------+---------------+-----------------------+
| cluster_key | timestamp                  | total_block_count | constant_block_count | average_overlaps | average_depth | block_depth_histogram |
+-------------+----------------------------+-------------------+----------------------+------------------+---------------+-----------------------+
| (a)         | 2024-07-18 10:36:10.959885 |                 4 |                    0 |              0.0 |           1.0 | {"00001":4}           |
+-------------+----------------------------+-------------------+----------------------+------------------+---------------+-----------------------+
1 row in set (0.05 sec)
Read 1 rows, 448.00 B in 0.024 sec., 41.35 rows/sec., 18.09 KiB/sec.

mysql> explain select * from t where a=6;
+--------------------------------------------------------------------------------------------------------------------------+
| explain                                                                                                                  |
+--------------------------------------------------------------------------------------------------------------------------+
| Filter                                                                                                                   |
| ├── output columns: [t.a (#0)]                                                                                           |
| ├── filters: [is_true(t.a (#0) = 6)]                                                                                     |
| ├── estimated rows: 1.00                                                                                                 |
| └── TableScan                                                                                                            |
|     ├── table: default.default.t                                                                                         |
|     ├── output columns: [a (#0)]                                                                                         |
|     ├── read rows: 3                                                                                                     |
|     ├── read size: < 1 KiB                                                                                               |
|     ├── partitions total: 4                                                                                              |
|     ├── partitions scanned: 1                                                                                            |
|     ├── pruning stats: [segments: <range pruning: 2 to 1>, blocks: <range pruning: 2 to 1, bloom pruning: 1 to 1>]       |
|     ├── push downs: [filters: [is_true(t.a (#0) = 6)], limit: NONE]                                                      |
|     └── estimated rows: 12.00                                                                                            |
+--------------------------------------------------------------------------------------------------------------------------+
14 rows in set (0.05 sec)
Read 0 rows, 0.00 B in 0.008 sec., 0 rows/sec., 0.00 B/sec.

mysql> select segment_count, block_count, row_count from fuse_snapshot('default', 't');
+---------------+-------------+-----------+
| segment_count | block_count | row_count |
+---------------+-------------+-----------+
|             2 |           4 |        12 |
|             2 |           4 |        12 |
|             3 |           4 |        12 |
|             2 |           3 |         9 |
|             1 |           2 |         6 |
|             2 |           2 |         6 |
|             1 |           1 |         3 |
+---------------+-------------+-----------+
7 rows in set (0.05 sec)
Read 7 rows, 2.77 KiB in 0.026 sec., 538 rows/sec., 106.50 KiB/sec.

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Jul 17, 2024
@zhyass zhyass marked this pull request as draft July 17, 2024 17:54
@zhyass zhyass force-pushed the feature_cluster_table branch 2 times, most recently from 9ef127d to da8fcef Compare July 18, 2024 08:09
@zhyass zhyass added the ci-cloud Build docker image for cloud test label Jul 18, 2024

This comment was marked as outdated.

@zhyass zhyass marked this pull request as ready for review July 18, 2024 10:45
@BohuTANG
Copy link
Member

How much faster is the new recluster than the old?

@zhyass zhyass marked this pull request as draft July 18, 2024 12:50
@zhyass zhyass added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Jul 18, 2024

This comment was marked as outdated.

@zhyass zhyass added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Jul 18, 2024
@zhyass
Copy link
Member Author

zhyass commented Jul 18, 2024

How much faster is the new recluster than the old?

The point of this pr is not to increase execution speed.

Instead, it modifies some of the logic of compact and recluster.

For example, recluster will implicitly execute compact. Compact on a cluster table will directly execute recluster instead of compact+recluster.

This comment was marked as outdated.

@zhyass zhyass force-pushed the feature_cluster_table branch from 2b533f2 to c613bd6 Compare July 18, 2024 15:56
@zhyass zhyass marked this pull request as ready for review July 18, 2024 15:56
@zhyass zhyass force-pushed the feature_cluster_table branch from 49cd1da to d1b9ab9 Compare July 23, 2024 09:29
chore: add defensive checks and rename variables for clarity
@dantengsky dantengsky added this pull request to the merge queue Jul 24, 2024
@BohuTANG BohuTANG removed this pull request from the merge queue due to a manual request Jul 24, 2024
@BohuTANG BohuTANG merged commit 18676c2 into databendlabs:main Jul 24, 2024
71 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-cloud Build docker image for cloud test pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants