-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for re-used check cols in snapshots #1614
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
5e6e746
possible fix for re-used check cols on BQ
drewbanin 4df0bbd
touchup var name and sql formatting
drewbanin a2e801c
pep8
drewbanin 35d1a7a
add tests
drewbanin 329145c
Merge branch 'dev/0.14.1' into fix/snapshot-check-cols-cycle
drewbanin b6e7351
snapshot surrogate key whitespace control
drewbanin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -202,15 +202,31 @@ def raw_execute(self, sql, fetch=False): | |
|
||
def execute(self, sql, auto_begin=False, fetch=None): | ||
# auto_begin is ignored on bigquery, and only included for consistency | ||
_, iterator = self.raw_execute(sql, fetch=fetch) | ||
query_job, iterator = self.raw_execute(sql, fetch=fetch) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice! |
||
|
||
if fetch: | ||
res = self.get_table_from_response(iterator) | ||
else: | ||
res = dbt.clients.agate_helper.empty_table() | ||
|
||
# If we get here, the query succeeded | ||
status = 'OK' | ||
if query_job.statement_type == 'CREATE_VIEW': | ||
status = 'CREATE VIEW' | ||
|
||
elif query_job.statement_type == 'CREATE_TABLE_AS_SELECT': | ||
conn = self.get_thread_connection() | ||
client = conn.handle | ||
table = client.get_table(query_job.destination) | ||
status = 'CREATE TABLE ({})'.format(table.num_rows) | ||
|
||
elif query_job.statement_type in ['INSERT', 'DELETE', 'MERGE']: | ||
status = '{} ({})'.format( | ||
query_job.statement_type, | ||
query_job.num_dml_affected_rows | ||
) | ||
|
||
else: | ||
status = 'OK' | ||
|
||
return status, res | ||
|
||
def create_bigquery_table(self, database, schema, table_name, callback, | ||
|
9 changes: 6 additions & 3 deletions
9
plugins/bigquery/dbt/include/bigquery/macros/materializations/snapshot.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
51 changes: 51 additions & 0 deletions
51
...ration/004_simple_snapshot_test/check-snapshots-expected/check_snapshots_test_current.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
|
||
|
||
with query as ( | ||
|
||
-- check that the current value for id=1 is red | ||
select case when ( | ||
select count(*) | ||
from {{ ref('check_cols_cycle') }} | ||
where id = 1 and color = 'red' and dbt_valid_to is null | ||
) = 1 then 0 else 1 end as failures | ||
|
||
union all | ||
|
||
-- check that the previous 'red' value for id=1 is invalidated | ||
select case when ( | ||
select count(*) | ||
from {{ ref('check_cols_cycle') }} | ||
where id = 1 and color = 'red' and dbt_valid_to is not null | ||
) = 1 then 0 else 1 end as failures | ||
|
||
union all | ||
|
||
-- check that there's only one current record for id=2 | ||
select case when ( | ||
select count(*) | ||
from {{ ref('check_cols_cycle') }} | ||
where id = 2 and color = 'pink' and dbt_valid_to is null | ||
) = 1 then 0 else 1 end as failures | ||
|
||
union all | ||
|
||
-- check that the previous value for id=2 is represented | ||
select case when ( | ||
select count(*) | ||
from {{ ref('check_cols_cycle') }} | ||
where id = 2 and color = 'green' and dbt_valid_to is not null | ||
) = 1 then 0 else 1 end as failures | ||
|
||
union all | ||
|
||
-- check that there are 5 records total in the table | ||
select case when ( | ||
select count(*) | ||
from {{ ref('check_cols_cycle') }} | ||
) = 5 then 0 else 1 end as failures | ||
|
||
) | ||
|
||
select * | ||
from query | ||
where failures = 1 |
33 changes: 33 additions & 0 deletions
33
test/integration/004_simple_snapshot_test/check-snapshots/check_cols_cycle.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
|
||
{% snapshot check_cols_cycle %} | ||
|
||
{{ | ||
config( | ||
target_database=database, | ||
target_schema=schema, | ||
unique_key='id', | ||
strategy='check', | ||
check_cols=['color'] | ||
) | ||
}} | ||
|
||
{% if var('version') == 1 %} | ||
|
||
select 1 as id, 'red' as color union all | ||
select 2 as id, 'green' as color | ||
|
||
{% elif var('version') == 2 %} | ||
|
||
select 1 as id, 'blue' as color union all | ||
select 2 as id, 'green' as color | ||
|
||
{% elif var('version') == 3 %} | ||
|
||
select 1 as id, 'red' as color union all | ||
select 2 as id, 'pink' as color | ||
|
||
{% else %} | ||
{% do exceptions.raise_compiler_error("Got bad version: " ~ var('version')) %} | ||
{% endif %} | ||
|
||
{% endsnapshot %} |
55 changes: 55 additions & 0 deletions
55
test/integration/004_simple_snapshot_test/test_snapshot_check_cols.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
from test.integration.base import DBTIntegrationTest, use_profile | ||
import dbt.exceptions | ||
|
||
|
||
class TestSimpleSnapshotFiles(DBTIntegrationTest): | ||
NUM_SNAPSHOT_MODELS = 1 | ||
|
||
@property | ||
def schema(self): | ||
return "simple_snapshot_004" | ||
|
||
@property | ||
def models(self): | ||
return "models" | ||
|
||
@property | ||
def project_config(self): | ||
return { | ||
"snapshot-paths": ['check-snapshots'], | ||
"test-paths": ['check-snapshots-expected'], | ||
"source-paths": [], | ||
} | ||
|
||
def test_snapshot_check_cols_cycle(self): | ||
results = self.run_dbt(["snapshot", '--vars', 'version: 1']) | ||
self.assertEqual(len(results), 1) | ||
|
||
results = self.run_dbt(["snapshot", '--vars', 'version: 2']) | ||
self.assertEqual(len(results), 1) | ||
|
||
results = self.run_dbt(["snapshot", '--vars', 'version: 3']) | ||
self.assertEqual(len(results), 1) | ||
|
||
def assert_expected(self): | ||
self.run_dbt(['test', '--data', '--vars', 'version: 3']) | ||
|
||
@use_profile('snowflake') | ||
def test__snowflake__simple_snapshot(self): | ||
self.test_snapshot_check_cols_cycle() | ||
self.assert_expected() | ||
|
||
@use_profile('postgres') | ||
def test__postgres__simple_snapshot(self): | ||
self.test_snapshot_check_cols_cycle() | ||
self.assert_expected() | ||
|
||
@use_profile('bigquery') | ||
def test__bigquery__simple_snapshot(self): | ||
self.test_snapshot_check_cols_cycle() | ||
self.assert_expected() | ||
|
||
@use_profile('redshift') | ||
def test__redshift__simple_snapshot(self): | ||
self.test_snapshot_check_cols_cycle() | ||
self.assert_expected() |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if there are two rows with the same primary key in one snapshot operation? Obviously user error, but do we do anything like error about it, or just give bad results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that will result in "undefined behavior" -- the exact outcome depends on:
check_cols
values are duplicated as well, or if they differ across the duplicated primary keyWhichever way you slice it, dbt is going to do something undesirable if the specified primary key is duplicated in the snapshot operation though. On pg/redshift, you'll end up with dupes in your snapshot table. On snowflake/bq, you should see a
nondeterministic merge
error.We could:
row_number()
(or similar) to dedupe records when building the staging tableunique_keys
are uniqueschema.yml
spec for snapshots?