-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor fixes to incremental normalization and nesting #7669
Conversation
{{ config( | ||
indexes = [{'columns':['_airbyte_emitted_at'],'type':'hash'}], | ||
unique_key = '_airbyte_ab_id', | ||
schema = "test_normalization", | ||
tags = [ "nested" ] | ||
) }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Un-nesting an object type column from the parent stream can re-use the same _airbyte_ab_id
column as unique_key
{{ config( | ||
indexes = [{'columns':['_airbyte_emitted_at'],'type':'hash'}], | ||
schema = "test_normalization", | ||
tags = [ "nested" ] | ||
) }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Un-nesting an array type column from the parent stream can NOT re-use the same _airbyte_ab_id
column as unique_key... each element of the nested array would have the same unique_key and it wouldn't be unique anymore...
(this would fail with exceptions on certain destinations)
partition by "id", currency, cast(nzd as | ||
varchar | ||
) | ||
order by | ||
"date" is null asc, | ||
"date" desc, | ||
_airbyte_emitted_at desc | ||
) is null then 1 else 0 end as _airbyte_active_row, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when all (multiple) rows for the same primary key have a cursor (date here) equal to NULL, then this will flag multiple rows as active...
# Nested Streams don't inherit parents sync modes? | ||
source_sync_mode=SyncMode.full_refresh, | ||
destination_sync_mode=DestinationSyncMode.append, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sync modes should be propagated to children tables, otherwise incremental is not going to work properly on nested streams
# because of https://github.com/dbt-labs/docs.getdbt.com/issues/335, we have to use tables for postgres | ||
forced_materialization_type = TableMaterializationType.TABLE | ||
else: | ||
forced_materialization_type = TableMaterializationType.VIEW |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VIEW materialization on Postgres and INCREMENTAL don't do well together
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created this issue to gain a measurable impact of such changes in the future: #7741
/test connector=bases/base-normalization
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this is more than a small fix. It fixes so many issue.
It does not seem that all the fixes are unit-tested though (e.g. the case of nested tables with array fields, the rare edge case of deduplicated tables in append_dedup sync mode with NULL cursor field). Can you add test for them?
/test connector=connectors/destination-snowflake
|
/test connector=connectors/destination-redshift
|
/test connector=connectors/destination-snowflake
|
/test connector=connectors/destination-bigquery
|
/test connector=connectors/destination-postgres
|
yes, It's multiple small or minor fixes... Most are not unit-testable though (because they rely on data use cases when run with dbt) and unit tests only can test the model generation part. But they are in the integration tests, some of the fixes surfaced thanks to the tests actually. |
/test connector=connectors/destination-redshift
|
/publish connector=bases/base-normalization
|
Nice. Thanks. |
What / How
This PR fixes minor behaviors with incremental:
_airbyte_emitted_at
(BigQuery) see https://airbytehq.slack.com/archives/C01MFR03D5W/p1636035876106000Recommended reading order
x.java
y.python