Feature/bq incremental and archive #856

drewbanin · 2018-07-17T03:22:02Z

Fixes #712 by adding support for archival and incremental models on bigquery.

Incremental models are implemented with merge statements, whereas archival just uses insert and update statements. Future work should be done to make use of the merge statement for archival on BigQuery.

In pursuit of archival, some new functionality needed to be added to the bigquery adapter:

create temporary tables
programmatically add columns to a table
translate bigquery column type responses to actual types (eg FLOAT --> FLOAT64)

Fixes #712

This branch changes the returned column type from eg. FLOAT to FLOAT64 or from INT to INT64.

beckjake

Very cool, this looks like it was pretty tough! I have some minor feedback and questions but nothing big.

beckjake · 2018-07-17T14:12:01Z

dbt/adapters/bigquery/impl.py

+        # mirrors the implementation of list_relations for other adapters
+
+        try:
+            all_tables = list(all_tables)


Have you considered consuming the iterator here by moving the list comprehension up? As it is this is going to make two copies of each table (one BQ API result and one Relation). If we're actually pulling up to 100k entries, doubling that could be a lot!

I think 100k would be super excessive! But agree, happy to slide the comprehension up into the try/catch

beckjake · 2018-07-17T14:36:08Z

dbt/include/global_project/macros/materializations/archive/archive.sql

@@ -18,12 +75,12 @@

        select
            {% for col in adapter.get_columns_in_table(source_relation.schema, source_relation.identifier) %}


shouldn't this just be for col in cols?

yes, 100%! Good catch

beckjake · 2018-07-17T14:46:14Z

dbt/include/global_project/macros/materializations/incremental/bigquery_incremental.sql

+  {%- set full_refresh_mode = (flags.FULL_REFRESH == True) -%}
+
+  {% if non_destructive_mode %}
+    {{ log("--non-destructive is not supported on BigQuery, and will be ignored", info=True) }}


It seems to me like this should fail with an error rather than just logging and ignoring it.

beckjake · 2018-07-17T14:49:21Z

dbt/schema.py

+        return cls.TYPE_LABELS.get(dtype.upper(), dtype)
+
+    @classmethod
+    def create(cls, name, label=None, dtype=None):


Based on how translate_type works, it looks like you could just pass one label_or_dtype argument and pass it to translate_type unconditionally, no need for two arguments.

beckjake · 2018-07-17T14:52:38Z

test/integration/001_simple_copy_test/test_simple_copy.py

+        self.use_default_project({"data-paths": [self.dir("seed-initial")]})
+
+        self.run_dbt(["seed"])
+        self.run_dbt()


These tests should probably check to make sure the right number of things are being run (see #854)

drewbanin · 2018-07-17T17:41:31Z

@beckjake just updated this with your review feedback

beckjake · 2018-07-17T17:45:07Z

dbt/include/global_project/macros/materializations/archive/archive.sql

@@ -51,16 +51,14 @@
 {#
    Cross-db compatible archival implementation
 #}
-{% macro archive_select(source_relation, target_relation, unique_key, updated_at) %}
+{% macro archive_select(source_relation, target_relation, source_columns, unique_key, updated_at) %}


nice, this is even better

cmcarthur

@drew this is fantastic. this branch cleans up a lot of rough edges in the bigquery adapter and Column class, and then the actual code to implement incremental and archive looks very familiar. really great work.

cmcarthur · 2018-07-18T14:38:36Z

dbt/include/global_project/macros/materializations/common/merge.sql

+  {{ adapter_macro('get_merge_sql', target, source, unique_key, dest_columns) }}
+{%- endmacro %}
+
+{% macro default__get_merge_sql(target, source, unique_key, dest_columns) -%}


am i correct in assuming that this works for snowflake and bigquery, but not postgres and redshift? can you add adapter macros for postgres and redshift to raise an exception if this is used?

@cmcarthur long-term, my plan is to implement a version of merge for Redshift and Postgres.

The merge macro will serve as an abstraction that should work across all adapters, while it might compile to something like an insert and update statement on Redshift and Postgres. I really like the idea of making the core materialization logic identical across all adapters, and just calling out to an adapter-specific version of the merge macro.

I can definitely add an exception for pg/redshift for now though

cmcarthur · 2018-07-18T16:14:31Z

dbt/include/global_project/macros/adapters/common.sql

+{% macro default__create_schema(schema_name) %}
+  {% call statement() %}
+    create schema if not exists {{ schema_name }};
+  {% endcall %}


get rid of this create_schema macro, use adapter.create_schema instead

drewbanin · 2018-07-19T02:26:17Z

🌞 ☁️ ☁️

🌊 🚢 🌊

shippin it

drewbanin added 7 commits July 16, 2018 19:36

Implement archive and incremental models for bigquery

23b365e

Fixes #712

add tests, change FLOAT to FLOAT64 in integration test

f361173

This branch changes the returned column type from eg. FLOAT to FLOAT64 or from INT to INT64.

extra test for nested/repeated records

3ffca9c

pep8 / cleanup

2961d84

fix failed tests, typos

7523b9c

more cleanup

21e6711

fix for wrong function call

c260d7d

drewbanin requested review from cmcarthur and beckjake July 17, 2018 14:03

Merge branch 'development' into feature/bq-incremental-and-archive

fd1573c

beckjake reviewed Jul 17, 2018

View reviewed changes

pr feedback

d9d164f

beckjake approved these changes Jul 17, 2018

View reviewed changes

drewbanin added 2 commits July 17, 2018 16:24

possible fix for rate limit errors

e4f6669

pep8

55c848f

cmcarthur approved these changes Jul 18, 2018

View reviewed changes

cmcarthur reviewed Jul 18, 2018

View reviewed changes

pr feedback

a0718e5

drewbanin merged commit 574d859 into development Jul 19, 2018

drewbanin deleted the feature/bq-incremental-and-archive branch July 19, 2018 02:26

drewbanin mentioned this pull request Jul 19, 2018

BigQuery incremental tables and hooks #802

Closed

beckjake mentioned this pull request Sep 17, 2018

Include datasets with underscores when listing BigQuery datasets #1006

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/bq incremental and archive #856

Feature/bq incremental and archive #856

drewbanin commented Jul 17, 2018

beckjake left a comment

beckjake Jul 17, 2018

drewbanin Jul 17, 2018

beckjake Jul 17, 2018

drewbanin Jul 17, 2018

beckjake Jul 17, 2018

beckjake Jul 17, 2018

beckjake Jul 17, 2018

drewbanin commented Jul 17, 2018

beckjake Jul 17, 2018

cmcarthur left a comment •

edited

Loading

cmcarthur Jul 18, 2018

drewbanin Jul 18, 2018

cmcarthur Jul 18, 2018

drewbanin commented Jul 19, 2018

		@@ -18,12 +75,12 @@

		select
		{% for col in adapter.get_columns_in_table(source_relation.schema, source_relation.identifier) %}

Feature/bq incremental and archive #856

Feature/bq incremental and archive #856

Conversation

drewbanin commented Jul 17, 2018

beckjake left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

drewbanin commented Jul 17, 2018

Choose a reason for hiding this comment

cmcarthur left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

drewbanin commented Jul 19, 2018

cmcarthur left a comment •

edited

Loading