Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BQ] Fix insert_overwrite with int + ts partitions #3098

Merged
merged 1 commit into from
Feb 12, 2021

Conversation

jtcohen6
Copy link
Contributor

@jtcohen6 jtcohen6 commented Feb 12, 2021

fixes #3063
fixes #3095

Description

In v0.19.0, we added a granularity property of BigQuery partitioned tables. We wrote a general-purpose approach that leveraged the data type + _trunc functions:

https://github.com/fishtown-analytics/dbt/blob/2aa10fb1ed9f012cebb89cd6921b0c124a35aec8/plugins/bigquery/dbt/adapters/bigquery/impl.py#L65

Very cool! Unfortunately, it broke the "dynamic" insert_overwrite incremental strategy with integer- and timestamp-partitioned tables, and we didn't have nearly the automated testing we should have for insert_overwrite given how many community members are using it.

This PR:

  • Adds logic handling for int64-type partitions, to avoid writing a (make-believe) int64_trunc function
  • Removes the requirement that timestamp and datetime partition values must really be dates (now that BigQuery has removed that requirement, in the form of hour-grain partitioning)
  • Performs case-insensitive equality check for partition fields and types (granularities) to avoid hard-refreshing a table if we don't have to. (I can't think of a great way to test for this, as it would require inspecting the logs. Functionally it yields the same result, one is just preferable as an atomic operation.)
  • Rework test cases for insert_overwrite + each possible partition type, as well as one case of the static partitions-based approach. For some reason, the old tests (022_bigquery_test/test_scripting.py) just was not running CI! I renamed to test_incremental_strategies.py, which better reflects the set of possibilities.

After merging, I plan to cherry-pick the commit onto 0.19.latest.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@cla-bot cla-bot bot added the cla:yes label Feb 12, 2021
@jtcohen6 jtcohen6 force-pushed the fix/bq-insert-overwrite-int-ts-partitions branch from 0f2a3bb to e01a10c Compare February 12, 2021 13:03
@jtcohen6 jtcohen6 marked this pull request as ready for review February 12, 2021 13:23
@jtcohen6 jtcohen6 requested review from gshank and kwigley February 12, 2021 13:24
@jtcohen6 jtcohen6 merged commit 36d1bdd into develop Feb 12, 2021
@jtcohen6 jtcohen6 deleted the fix/bq-insert-overwrite-int-ts-partitions branch February 12, 2021 14:57
return table_field == conf_partition.field \
and table_granularity == conf_partition.granularity
table_field = table.time_partitioning.field.lower()
table_granularity = table.partitioning_type.lower()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think table.partitioning_type could ever be None here? I'd think table.time_partitioning is not None also means table.partitioning_type has a value, just being nit-picky

Copy link
Contributor

@kwigley kwigley Feb 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, you can ignore! I dug around docs a little and I think

table.time_partitioning is not None also means table.partitioning_type has a value

holds true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah good catch though! if one of our intrepid dbt-bigquery beta testers manages to get an error like NoneType has no attribute lower, then we'll know who to blame (me) and where to fix it (here)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants