-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/snowflake view transactions #940
Conversation
- obviate difference between this.name and this.table - use `create or replace view` on snowflake - avoid doing costly alter tables and drops for snowflake views
@beckjake adding you for review also since you were just deep in the BQ view materialization code Does all of this look reasonable? There's a lot in here, but I think it fixes our immediate problem with:
as well as obviating the need for that terrible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. I guess I'm still not clear on non-destructive vs full_refresh but if you're confident in this logic, looks good to me.
{{ adapter.drop_relation(old_relation) }} | ||
{%- else -%} | ||
{{ exceptions.relation_wrong_type(old_relation, 'view') }} | ||
{%- endif -%} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking about this quite a bit, and I am not sure about the split here. Should snowflake really behave this fundamentally differently from bigquery? Never raise and ignore the full refresh flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The decision to raise this error on BigQuery was, I think, a shortsighted one. The error we throw here is very specifically intended to avoid clobbering an ingestion-time partitioned table with a view by accident. This could happen if you switch the type of your materialization from table
to view
. We don't do this check anywhere else in dbt, and ingestion-time partitioned tables are mostly superseded by column partitioning, so I don't know that too many people are actually being helped out by this error in practice.
I'm in favor of dropping the exception if @cmcarthur is into it as well. I agree that it's a thorny maintenance burden and has little utility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you remove the exception you also get to remove handle_existing_table
entirely, and removing code is always a win.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drewbanin I defer totally to you on it. IMO it's totally fine to handle these kinds of errors situationally based on the adapter. We added this error for a really good reason, namely building time partitioned tables was costly, and dropping it unintentionally would cause users a lot of pain. If you believe that this isn't going to cause pain then I'm on board with removing it
@@ -0,0 +1,86 @@ | |||
from nose.plugins.attrib import attr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need this import
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drewbanin i defer to you totally on the logic here, but this implementation does not look quite right to me. if you'd like to discuss further in person, ping me
{{ adapter.drop_relation(old_relation) }} | ||
{%- else -%} | ||
{{ exceptions.relation_wrong_type(old_relation, 'view') }} | ||
{%- endif -%} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drewbanin I defer totally to you on it. IMO it's totally fine to handle these kinds of errors situationally based on the adapter. We added this error for a really good reason, namely building time partitioned tables was costly, and dropping it unintentionally would cause users a lot of pain. If you believe that this isn't going to cause pain then I'm on board with removing it
as dropping inside the transaction will lead to a "table dropped by concurrent transaction" | ||
error. In the future, Snowflake should use `create or replace table` syntax to obviate this code | ||
*/ #} | ||
{% if adapter.type() == 'snowflake' and old_relation.type == 'view' %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤢
This is antithetical to the adapter architecture. Why not implement totally different table and view materializations for Snowflake vs. Bigquery? Did we architect this part incorrectly?
This PR fixes #930. Additionally, it reconciles
this.name
andthis.table
sort of by accident.The Problem
When Snowflake views are created, the column structure for the view is determined and saved by Snowflake. For example, if a Snowflake view is defined as
select * from some_table
, then this view will have a structure identical to that ofsome_table
. If in the future,some_table
is recreated with a new definition, then the saved structure for the original view might become out of sync with it's new definition. For example:What happened
dbt's behavior around building tables and views changed in 0.10.2. Namely: statements were reordered so that
drop
statements are run outside of transactions. This change affects Postgres, Redshift, and Snowflake. It did have the positive effect of reducing concurrent transaction errors (more info on this here), but it also caused this unintended consequence with Snowflake views.Manifestation in dbt
dbt failed with this error during the swap-and-drop part of the view and table materializations. In particular, running:
would fail if
some_view
's definition became inconsistent with its saved schema. This would happen when the definition of an upstream model was changed manually, for instance.The Fix
The best way to fix this is to avoid the whole create-temp then swap-and-drop workflow. Instead, we can very simply execute
create or replace view
to atomically overwrite an existing view with its new definition. When this happens, Snowflake does not try to validate the existing view's definition against it's saved structure.Fallout
With this change, dbt is no longer creating a
__dbt_tmp
suffixed view, so a special case would need to be made for thethis
variable. Rather than add increasingly complex logic to preventthis.table
from using a__dbt_tmp
suffix for snowflake views (as we currently do for incremental models), this PR runs hooks after the new model is renamed to its final identifier. As a result, the__dbt_tmp
table (if it exists) is never visible to hooks, so we can makethis.schema == this.name
.