-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Clustering for Snowflake #1689
Implement Clustering for Snowflake #1689
Conversation
Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA. In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, don't hesitate to ping @drewbanin. CLA has not been signed by users: @bastienboutonnet |
Btw @drewbanin CLA is now signed... |
Thanks @bastienboutonnet! Kicking off the tests now :) @cla-bot check |
The cla-bot has been summoned, and re-checked this pull request! |
{%- if cluster_by_keys is not none and cluster_by_keys is string -%} | ||
{%- set cluster_by_keys = [cluster_by_keys] -%} | ||
{%- endif -%} | ||
{%- set cluster_by_string = cluster_by_keys|join(", ")-%} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this tries to call join
on none
when the cluster_by
config is unset. Can we try something like:
{%- if cluster_by_keys is not none and cluster_by_keys is string -%}
{%- set cluster_by_keys = [cluster_by_keys] -%}
{%- set cluster_by_string = cluster_by_keys|join(", ")-%}
{% else %}
{%- set cluster_by_keys = [] -%}
{%- set cluster_by_string = none -%}
{%- endif -%}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and then below, we'll want to do:
{% if cluster_by_string is not none -%}
alter table {{relation}} cluster by ({{cluster_by_string}});
{%- endif -%}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh damn, that was really stupid of me...
But actually the whole logic is pretty wild here... I was doing really stupid stuff but it should work after my last commit. Let me know if you don't agree with that flow. I tested it and it works both for list of strings, a list of one string or just one string.
I also noticed a corner case. We're allowing alter table {{relation}} resume recluster;
to run on the condition that this config argument be set to true. The issue with that is that if the table was not created as a clustered table (i.e. that no cluster_by_keys
were provided) then Snowflake will throw an error that the table is not clustered.
I changed this piece of logic as well. Let me know what you think.
Co-Authored-By: Drew Banin <drew@fishtownanalytics.com>
Merging this - thanks for your hard work in this contribution @bastienboutonnet!! |
pleasure as always! |
Aim
Brings clustering to snowflake.
Table materialisations will leverage an order by via wrapping the SQL in a
select * from ( {{sql}} ) order by ( {{cluster_by_keys}}
Incremental materialisation will create table as above and followed by an alter statement
alter table {{relation}} cluster by ({{cluster_by_keys}})
to leverage Snowflake's automatic clustering.This approach was discussed in #634
This is a duplicate of PR #1591 that I had to re open due to killing my fork in the meantime... All discussion can be found there though.