Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor fixes to incremental normalization and nesting #7669

Merged
merged 10 commits into from
Nov 8, 2021

Conversation

ChristopheDuong
Copy link
Contributor

@ChristopheDuong ChristopheDuong commented Nov 5, 2021

What / How

This PR fixes minor behaviors with incremental:

  • choice on when Incremental dbt should be used should depend on destination sync mode, not source sync mode
    • this change enables incremental dbt for FULL_REFRESH-APPEND sync modes too
  • safety cast to TIMESTAMP if the schema of raw & normalized tables were changed on _airbyte_emitted_at (BigQuery) see https://airbytehq.slack.com/archives/C01MFR03D5W/p1636035876106000
  • potential speed improvement on incremental with Postgres for SCD tables (avoid 'view' materialization because it doesn't combine well with dbt incremental on postgres)
  • fix some incremental issue with nested tables with array fields (the _airbyte_ab_id column is not unique in that case)
  • deduplicated tables in append_dedup sync mode fail to retain only one row per primary key if the cursor field is always NULL (rare edge case)
  • unique_key on postgres is now unique for de-duplicated tables, closes Normalization: set primary key field as primary key in final table #6161
  • moving/re-enabling some tests

Recommended reading order

  1. x.java
  2. y.python

@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets November 5, 2021 11:51 Inactive
Comment on lines +1 to +6
{{ config(
indexes = [{'columns':['_airbyte_emitted_at'],'type':'hash'}],
unique_key = '_airbyte_ab_id',
schema = "test_normalization",
tags = [ "nested" ]
) }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Un-nesting an object type column from the parent stream can re-use the same _airbyte_ab_id column as unique_key

Comment on lines +1 to +5
{{ config(
indexes = [{'columns':['_airbyte_emitted_at'],'type':'hash'}],
schema = "test_normalization",
tags = [ "nested" ]
) }}
Copy link
Contributor Author

@ChristopheDuong ChristopheDuong Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Un-nesting an array type column from the parent stream can NOT re-use the same _airbyte_ab_id column as unique_key... each element of the nested array would have the same unique_key and it wouldn't be unique anymore...

(this would fail with exceptions on certain destinations)

partition by "id", currency, cast(nzd as
varchar
)
order by
"date" is null asc,
"date" desc,
_airbyte_emitted_at desc
) is null then 1 else 0 end as _airbyte_active_row,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when all (multiple) rows for the same primary key have a cursor (date here) equal to NULL, then this will flag multiple rows as active...

Comment on lines -147 to -149
# Nested Streams don't inherit parents sync modes?
source_sync_mode=SyncMode.full_refresh,
destination_sync_mode=DestinationSyncMode.append,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sync modes should be propagated to children tables, otherwise incremental is not going to work properly on nested streams

# because of https://github.com/dbt-labs/docs.getdbt.com/issues/335, we have to use tables for postgres
forced_materialization_type = TableMaterializationType.TABLE
else:
forced_materialization_type = TableMaterializationType.VIEW
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VIEW materialization on Postgres and INCREMENTAL don't do well together

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created this issue to gain a measurable impact of such changes in the future: #7741

@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets November 5, 2021 12:15 Inactive
@ChristopheDuong
Copy link
Contributor Author

ChristopheDuong commented Nov 5, 2021

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1425639035
✅ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1425639035
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        75      8    89%
	 source_acceptance_test/conftest.py                     108    108     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              200     94    53%
	 source_acceptance_test/tests/test_full_refresh.py       38     27    29%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  41     24    41%
	 source_acceptance_test/utils/compare.py                 62     25    60%
	 source_acceptance_test/utils/connector_runner.py        82     49    40%
	 source_acceptance_test/utils/json_schema_helper.py     115     14    88%
	 ------------------------------------------------------------------------
	 TOTAL                                                  896    440    51%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 main_dev_transform_catalog.py                                         3      3     0%
	 main_dev_transform_config.py                                          3      3     0%
	 normalization/__init__.py                                             4      0   100%
	 normalization/destination_type.py                                    12      0   100%
	 normalization/transform_catalog/__init__.py                           2      0   100%
	 normalization/transform_catalog/catalog_processor.py                143     77    46%
	 normalization/transform_catalog/destination_name_transformer.py     120      6    95%
	 normalization/transform_catalog/reserved_keywords.py                 11      0   100%
	 normalization/transform_catalog/stream_processor.py                 459    278    39%
	 normalization/transform_catalog/table_name_registry.py              174     34    80%
	 normalization/transform_catalog/transform.py                         45     26    42%
	 normalization/transform_catalog/utils.py                             33      7    79%
	 normalization/transform_config/__init__.py                            2      0   100%
	 normalization/transform_config/transform.py                         140     29    79%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                              1151    463    60%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                       Stmts   Miss  Cover
	 --------------------------------------------------------------
	 base_python/__init__.py                       13      0   100%
	 base_python/catalog_helpers.py                10      6    40%
	 base_python/cdk/__init__.py                    0      0   100%
	 base_python/cdk/abstract_source.py            83     59    29%
	 base_python/cdk/streams/__init__.py            0      0   100%
	 base_python/cdk/streams/auth/__init__.py       0      0   100%
	 base_python/cdk/streams/auth/core.py           8      1    88%
	 base_python/cdk/streams/auth/jwt.py            5      5     0%
	 base_python/cdk/streams/auth/oauth.py         37     26    30%
	 base_python/cdk/streams/auth/token.py          9      4    56%
	 base_python/cdk/streams/core.py               63     32    49%
	 base_python/cdk/streams/exceptions.py         10      2    80%
	 base_python/cdk/streams/http.py               67     33    51%
	 base_python/cdk/streams/rate_limiting.py      30     14    53%
	 base_python/cdk/utils/__init__.py              0      0   100%
	 base_python/cdk/utils/casing.py                4      0   100%
	 base_python/client.py                         56     33    41%
	 base_python/entrypoint.py                     70     56    20%
	 base_python/integration.py                    52     25    52%
	 base_python/logger.py                         33     19    42%
	 base_python/schema_helpers.py                 56     41    27%
	 base_python/source.py                         51     34    33%
	 main_dev.py                                    3      3     0%
	 --------------------------------------------------------------
	 TOTAL                                        660    393    40%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 main_dev_transform_catalog.py                                         3      3     0%
	 main_dev_transform_config.py                                          3      3     0%
	 normalization/__init__.py                                             4      0   100%
	 normalization/destination_type.py                                    12      0   100%
	 normalization/transform_catalog/__init__.py                           2      0   100%
	 normalization/transform_catalog/catalog_processor.py                143     77    46%
	 normalization/transform_catalog/destination_name_transformer.py     120      6    95%
	 normalization/transform_catalog/reserved_keywords.py                 11      0   100%
	 normalization/transform_catalog/stream_processor.py                 459    278    39%
	 normalization/transform_catalog/table_name_registry.py              174     34    80%
	 normalization/transform_catalog/transform.py                         45     26    42%
	 normalization/transform_catalog/utils.py                             33      7    79%
	 normalization/transform_config/__init__.py                            2      0   100%
	 normalization/transform_config/transform.py                         140     29    79%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                              1151    463    60%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 main_dev_transform_catalog.py                                         3      3     0%
	 main_dev_transform_config.py                                          3      3     0%
	 normalization/__init__.py                                             4      0   100%
	 normalization/destination_type.py                                    12      0   100%
	 normalization/transform_catalog/__init__.py                           2      0   100%
	 normalization/transform_catalog/catalog_processor.py                143     12    92%
	 normalization/transform_catalog/destination_name_transformer.py     120      4    97%
	 normalization/transform_catalog/reserved_keywords.py                 11      0   100%
	 normalization/transform_catalog/stream_processor.py                 459     36    92%
	 normalization/transform_catalog/table_name_registry.py              174     51    71%
	 normalization/transform_catalog/transform.py                         45     30    33%
	 normalization/transform_catalog/utils.py                             33      0   100%
	 normalization/transform_config/__init__.py                            2      0   100%
	 normalization/transform_config/transform.py                         140     45    68%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                              1151    184    84%

@jrhizor jrhizor temporarily deployed to more-secrets November 5, 2021 12:20 Inactive
@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets November 5, 2021 12:27 Inactive
@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets November 5, 2021 17:50 Inactive
Copy link
Contributor

@tuliren tuliren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this is more than a small fix. It fixes so many issue.

It does not seem that all the fixes are unit-tested though (e.g. the case of nested tables with array fields, the rare edge case of deduplicated tables in append_dedup sync mode with NULL cursor field). Can you add test for them?

@ChristopheDuong
Copy link
Contributor Author

ChristopheDuong commented Nov 8, 2021

/test connector=connectors/destination-snowflake

🕑 connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/1433908796
❌ connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/1433908796
🐛 https://gradle.com/s/yizelktati4o6

@ChristopheDuong
Copy link
Contributor Author

ChristopheDuong commented Nov 8, 2021

/test connector=connectors/destination-redshift

🕑 connectors/destination-redshift https://github.com/airbytehq/airbyte/actions/runs/1433909489
❌ connectors/destination-redshift https://github.com/airbytehq/airbyte/actions/runs/1433909489
🐛 https://gradle.com/s/5vmn72vc25wu4

@jrhizor jrhizor temporarily deployed to more-secrets November 8, 2021 08:41 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets November 8, 2021 08:42 Inactive
@ChristopheDuong
Copy link
Contributor Author

ChristopheDuong commented Nov 8, 2021

/test connector=connectors/destination-snowflake

🕑 connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/1434482706
✅ connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/1434482706
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 main_dev_transform_catalog.py                                         3      3     0%
	 main_dev_transform_config.py                                          3      3     0%
	 normalization/__init__.py                                             4      0   100%
	 normalization/destination_type.py                                    12      0   100%
	 normalization/transform_catalog/__init__.py                           2      0   100%
	 normalization/transform_catalog/catalog_processor.py                143     77    46%
	 normalization/transform_catalog/destination_name_transformer.py     120      6    95%
	 normalization/transform_catalog/reserved_keywords.py                 11      0   100%
	 normalization/transform_catalog/stream_processor.py                 463    282    39%
	 normalization/transform_catalog/table_name_registry.py              174     34    80%
	 normalization/transform_catalog/transform.py                         45     26    42%
	 normalization/transform_catalog/utils.py                             33      7    79%
	 normalization/transform_config/__init__.py                            2      0   100%
	 normalization/transform_config/transform.py                         140     29    79%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                              1155    467    60%

@ChristopheDuong
Copy link
Contributor Author

ChristopheDuong commented Nov 8, 2021

/test connector=connectors/destination-bigquery

🕑 connectors/destination-bigquery https://github.com/airbytehq/airbyte/actions/runs/1434483790
✅ connectors/destination-bigquery https://github.com/airbytehq/airbyte/actions/runs/1434483790
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 main_dev_transform_catalog.py                                         3      3     0%
	 main_dev_transform_config.py                                          3      3     0%
	 normalization/__init__.py                                             4      0   100%
	 normalization/destination_type.py                                    12      0   100%
	 normalization/transform_catalog/__init__.py                           2      0   100%
	 normalization/transform_catalog/catalog_processor.py                143     77    46%
	 normalization/transform_catalog/destination_name_transformer.py     120      6    95%
	 normalization/transform_catalog/reserved_keywords.py                 11      0   100%
	 normalization/transform_catalog/stream_processor.py                 463    282    39%
	 normalization/transform_catalog/table_name_registry.py              174     34    80%
	 normalization/transform_catalog/transform.py                         45     26    42%
	 normalization/transform_catalog/utils.py                             33      7    79%
	 normalization/transform_config/__init__.py                            2      0   100%
	 normalization/transform_config/transform.py                         140     29    79%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                              1155    467    60%

@ChristopheDuong
Copy link
Contributor Author

ChristopheDuong commented Nov 8, 2021

/test connector=connectors/destination-postgres

🕑 connectors/destination-postgres https://github.com/airbytehq/airbyte/actions/runs/1434484150
✅ connectors/destination-postgres https://github.com/airbytehq/airbyte/actions/runs/1434484150
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 main_dev_transform_catalog.py                                         3      3     0%
	 main_dev_transform_config.py                                          3      3     0%
	 normalization/__init__.py                                             4      0   100%
	 normalization/destination_type.py                                    12      0   100%
	 normalization/transform_catalog/__init__.py                           2      0   100%
	 normalization/transform_catalog/catalog_processor.py                143     77    46%
	 normalization/transform_catalog/destination_name_transformer.py     120      6    95%
	 normalization/transform_catalog/reserved_keywords.py                 11      0   100%
	 normalization/transform_catalog/stream_processor.py                 463    282    39%
	 normalization/transform_catalog/table_name_registry.py              174     34    80%
	 normalization/transform_catalog/transform.py                         45     26    42%
	 normalization/transform_catalog/utils.py                             33      7    79%
	 normalization/transform_config/__init__.py                            2      0   100%
	 normalization/transform_config/transform.py                         140     29    79%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                              1155    467    60%

@jrhizor jrhizor temporarily deployed to more-secrets November 8, 2021 11:19 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets November 8, 2021 11:19 Inactive
@ChristopheDuong
Copy link
Contributor Author

ChristopheDuong commented Nov 8, 2021

Wow, this is more than a small fix. It fixes so many issue.

It does not seem that all the fixes are unit-tested though (e.g. the case of nested tables with array fields, the rare edge case of deduplicated tables in append_dedup sync mode with NULL cursor field). Can you add test for them?

yes, It's multiple small or minor fixes...

Most are not unit-testable though (because they rely on data use cases when run with dbt) and unit tests only can test the model generation part.

But they are in the integration tests, some of the fixes surfaced thanks to the tests actually.
I just added the scenario of append_dedup sync mode with the NULL cursor field.

@ChristopheDuong
Copy link
Contributor Author

ChristopheDuong commented Nov 8, 2021

/test connector=connectors/destination-redshift

🕑 connectors/destination-redshift https://github.com/airbytehq/airbyte/actions/runs/1434620247
✅ connectors/destination-redshift https://github.com/airbytehq/airbyte/actions/runs/1434620247
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 main_dev_transform_catalog.py                                         3      3     0%
	 main_dev_transform_config.py                                          3      3     0%
	 normalization/__init__.py                                             4      0   100%
	 normalization/destination_type.py                                    12      0   100%
	 normalization/transform_catalog/__init__.py                           2      0   100%
	 normalization/transform_catalog/catalog_processor.py                143     77    46%
	 normalization/transform_catalog/destination_name_transformer.py     120      6    95%
	 normalization/transform_catalog/reserved_keywords.py                 11      0   100%
	 normalization/transform_catalog/stream_processor.py                 463    282    39%
	 normalization/transform_catalog/table_name_registry.py              174     34    80%
	 normalization/transform_catalog/transform.py                         45     26    42%
	 normalization/transform_catalog/utils.py                             33      7    79%
	 normalization/transform_config/__init__.py                            2      0   100%
	 normalization/transform_config/transform.py                         140     29    79%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                              1155    467    60%

@jrhizor jrhizor temporarily deployed to more-secrets November 8, 2021 12:01 Inactive
@github-actions github-actions bot added area/documentation Improvements or additions to documentation area/platform issues related to the platform area/worker Related to worker labels Nov 8, 2021
@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets November 8, 2021 13:12 Inactive
@ChristopheDuong
Copy link
Contributor Author

ChristopheDuong commented Nov 8, 2021

/publish connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1434905738
✅ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1434905738

@jrhizor jrhizor temporarily deployed to more-secrets November 8, 2021 13:15 Inactive
@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets November 8, 2021 14:44 Inactive
@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets November 8, 2021 15:31 Inactive
@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets November 8, 2021 16:13 Inactive
@tuliren
Copy link
Contributor

tuliren commented Nov 8, 2021

But they are in the integration tests, some of the fixes surfaced thanks to the tests actually. I just added the scenario of append_dedup sync mode with the NULL cursor field.

Nice. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Normalization: set primary key field as primary key in final table
4 participants