-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/bq incremental and archive #856
Conversation
This branch changes the returned column type from eg. FLOAT to FLOAT64 or from INT to INT64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool, this looks like it was pretty tough! I have some minor feedback and questions but nothing big.
dbt/adapters/bigquery/impl.py
Outdated
# mirrors the implementation of list_relations for other adapters | ||
|
||
try: | ||
all_tables = list(all_tables) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered consuming the iterator here by moving the list comprehension up? As it is this is going to make two copies of each table (one BQ API result and one Relation). If we're actually pulling up to 100k entries, doubling that could be a lot!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 100k would be super excessive! But agree, happy to slide the comprehension up into the try/catch
@@ -18,12 +75,12 @@ | |||
|
|||
select | |||
{% for col in adapter.get_columns_in_table(source_relation.schema, source_relation.identifier) %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this just be for col in cols
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, 100%! Good catch
{%- set full_refresh_mode = (flags.FULL_REFRESH == True) -%} | ||
|
||
{% if non_destructive_mode %} | ||
{{ log("--non-destructive is not supported on BigQuery, and will be ignored", info=True) }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me like this should fail with an error rather than just logging and ignoring it.
dbt/schema.py
Outdated
return cls.TYPE_LABELS.get(dtype.upper(), dtype) | ||
|
||
@classmethod | ||
def create(cls, name, label=None, dtype=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on how translate_type
works, it looks like you could just pass one label_or_dtype
argument and pass it to translate_type
unconditionally, no need for two arguments.
self.use_default_project({"data-paths": [self.dir("seed-initial")]}) | ||
|
||
self.run_dbt(["seed"]) | ||
self.run_dbt() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests should probably check to make sure the right number of things are being run (see #854)
@beckjake just updated this with your review feedback |
@@ -51,16 +51,14 @@ | |||
{# | |||
Cross-db compatible archival implementation | |||
#} | |||
{% macro archive_select(source_relation, target_relation, unique_key, updated_at) %} | |||
{% macro archive_select(source_relation, target_relation, source_columns, unique_key, updated_at) %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, this is even better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drew this is fantastic. this branch cleans up a lot of rough edges in the bigquery adapter and Column class, and then the actual code to implement incremental and archive looks very familiar. really great work.
{{ adapter_macro('get_merge_sql', target, source, unique_key, dest_columns) }} | ||
{%- endmacro %} | ||
|
||
{% macro default__get_merge_sql(target, source, unique_key, dest_columns) -%} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
am i correct in assuming that this works for snowflake and bigquery, but not postgres and redshift? can you add adapter macros for postgres and redshift to raise an exception if this is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmcarthur long-term, my plan is to implement a version of merge
for Redshift and Postgres.
The merge
macro will serve as an abstraction that should work across all adapters, while it might compile to something like an insert
and update
statement on Redshift and Postgres. I really like the idea of making the core materialization logic identical across all adapters, and just calling out to an adapter-specific version of the merge
macro.
I can definitely add an exception for pg/redshift for now though
{% macro default__create_schema(schema_name) %} | ||
{% call statement() %} | ||
create schema if not exists {{ schema_name }}; | ||
{% endcall %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get rid of this create_schema macro, use adapter.create_schema instead
🌞 ☁️ ☁️ 🌊 🚢 🌊 shippin it |
Fixes #712 by adding support for archival and incremental models on bigquery.
Incremental models are implemented with
merge
statements, whereasarchival
just usesinsert
andupdate
statements. Future work should be done to make use of themerge
statement for archival on BigQuery.In pursuit of archival, some new functionality needed to be added to the bigquery adapter: