-
Notifications
You must be signed in to change notification settings - Fork 753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: hilbert clustering #16296
feat: hilbert clustering #16296
Conversation
c7961f9
to
65939f5
Compare
2bc16a1
to
734e421
Compare
Docker Image for PR
|
Looks good as a baseline implementation of Hilbert clustering for internal evaluation. This feature is not yet mature enough to be used by users (including private beta). Suggested next steps:
|
Docker Image for PR
|
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
Refer to https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=bfd6d94c98627756989b0147a68b7ab1f881a0d
Unlike traditional linear clustering, which relies on sorting data based on a linear order of the cluster key, Hilbert clustering encodes the cluster key using the Hilbert curve (a space-filling curve that preserves locality). The data is then sorted according to the Hilbert-encoded values.
Hilbert clustering optimizes data layout for queries with predicates on non-primary columns, enabling more effective filtering.
Syntax
Performance
The table contains 100,000,000 rows.
Important Consideration
The limitations of the Hilbert cluster lie in its dependence on the data distribution across different dimensions. If the data distribution is uneven in a particular dimension or if there are significant differences in the distribution characteristics between dimensions, it may result in poor clustering performance, leading to the need to scan more blocks during queries. Therefore, when using a Hilbert cluster, it is essential to consider whether the selected keys adequately represent the data's distribution characteristics.
Tests
Type of change
This change is