Document addition of incremental_predicates
#2636
Labels
content
Improvements or additions to content
dbt-core v1.4
Docs impact for the v1.4 release (Jan 2023)
improvement
Use this when an area of the docs needs improvement as it's currently unclear
Contributions
Link to the page on docs.getdbt.com requiring updates
https://docs.getdbt.com/docs/build/incremental-models
This might want to go under the "About
incremental_strategy
section," which is really acting as our catch-all for "advanced optimizations."What part(s) of the page would you like to see updated?
Support for a new (optional) config:
incremental_predicates
. This config accepts any valid SQL expression. (Note: dbt will not do any validation in advance, so it's up to the user to ensure that their SQL syntax is valid.)This is an advanced use of incremental models, where data volume is large enough to justify additional investments in performance. It's a real case of "trust the user," where some of the abstraction that dbt offers over vendor-specific
merge
statements is stripped away.For instance, this is a pattern we might expect to see on Snowflake:
This will template a
merge
statement like:The user will still want to limit data scan of upstream tables, like always, within the body of their incremental model SQL, to limit the amount of "new" data being processed / transformed:
Notes:
DBT_INTERNAL_DEST
("old" data) orDBT_INTERNAL_SOURCE
("new" data)insert_overwrite
incremental strategy. This whole section of the docs probably needs some more love, &/or to link to a revised version of https://discourse.getdbt.com/t/bigquery-dbt-incremental-changes/982 (which is now nearly 3 years old).Additional information
Relevant merged PRs:
For much much more context on the performance benefits of "bounding" the scan, e.g. to take advantage of a cluster key on Snowflake:
The text was updated successfully, but these errors were encountered: