-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix missing sql_header for incremental models #2200
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one question, otherwise LGTM
@@ -87,6 +90,9 @@ | |||
{% macro default__get_insert_overwrite_merge_sql(target, source, dest_columns, predicates) -%} | |||
{%- set predicates = [] if predicates is none else [] + predicates -%} | |||
{%- set dest_cols_csv = get_quoted_csv(dest_columns | map(attribute="name")) -%} | |||
{%- set sql_header = config.get('sql_header', none) -%} | |||
|
|||
{{ sql_header if sql_header is not none }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, if we do this, then the sql_header
will run twice for "dynamic" (scripting) insert_overwrite
materializations. (The first time will be when creating the temp table.) I assume that's ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good call. Temporary UDFs can't be defined twice in the same BQ script. Think we're going to need to find an alternative approach here.
I kicked around the idea of making this a materialization-level thing, and while I think that might be a good approach long-term, I don't want to touch every dbt materialization in support of this change. Going to try to figure out a nice way to make this sensible for a very specific use case (insert_overwrite on BigQuery incremental models) and localize the jank as much as possible
Let's get this merged for 0.16.1 - we should consider whether or not to keep this logic in adapter macros, or move it into materializations (the better long-term answer) Edit: This is not going out in 0.16.1, let's get it merged for 0.17.0 |
…t into fix/sql-header-in-incrementals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! We should be sure to note this change in the signature of get_merge_sql
in the release notes - plugins should probably know about this/use it (I think spark overrides get_merge_sql
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fine approach that fixes the problem at hand, which is interpolating the sql_header
in the right spots during incremental runs on BigQuery.
Thinking through the implications:
- This PR does change the contract of
get_insert_overwrite_merge_sql
by adding the argumentinclude_sql_header
. Right now, only the BigQuery adapter uses that macro, and it just uses thedefault
implementation. - This PR doesn't change the contract of
get_merge_sql
, which is good (this is a macro in wider use), but it also means it doesn't have the added lever ofinclude_sql_header
. Thedefault__
andsnowflake__
implementations ofget_merge_sql
must then always include thesql_header
if provided, meaning that we can only pass model SQL (and never a temp table) as thesource
ofget_merge_sql
for those implementations. That's okay based on how they're used today, but it's feasible that we may want to pass a temp relation source someday—something thatspark__get_merge_sql
does currently.
Thinking far ahead: My personal belief is that get_merge_sql
, get_delete_insert_merge_sql
, get_insert_overwrite_merge_sql
, etc. should eventually converge toward having the same basic contract, and that we should further leverage predicates
to (a) handle the idiosyncrasies of each, and (b) enable end users to pass additional custom logic (match conditions, filters, etc). That may be an unrealistic aspiration, though I don't think it's out of proportion with this being some of the trickiest, highest-leverage, and most cost-impactful SQL that dbt generates.
@beckjake FYI merging this before the tests pass - it was a changelog merge conflict |
resolves #2136
Description
This PR adds support for injecting a
sql_header
abovemerge
statements generated for incremental models.Checklist
CHANGELOG.md
and added information about my change to the "dbt next" section.