Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some abstraction to avoid direct generated object use #20905

Closed
wants to merge 6 commits into from

Conversation

gosusnp
Copy link
Contributor

@gosusnp gosusnp commented Dec 28, 2022

What

With the introduction of Airbyte Protocol versioning, direct use of the generated objects comes with some overhead:

  • Moving to a new version means patching all the import statement of classes that are relying on protocol objects
  • Java Connectors are sharing some classes from the platform which led to some non-straight forward dependencies. We have a disconnect since there will be a transition period during which we will have different versions of the protocol for different connectors at the same time.

How

In order to reduce the coupling:

  • we introduce interfaces that describes the different concepts used in the protocol
  • we leverage adapters to map the most current version of the protocol to the interface

The key benefits:

  • This layer of abstraction gives us more flexibility to decouple our internal logic from the network representation of the objects
  • Bumping a version means update the adapter rather all the imports
  • This removes the explicit dependencies on the protocol version for the java connectors and allows us to shift the focus towards the information read rather than the container

The tradeoff is that we need to explicitly update the interfaces/adapters to channel new information.

Recommended reading order

  1. x.java
  2. y.python

🚨 User Impact 🚨

No behavior change expected, this is targeting dev efficiency.

@octavia-squidington-iv octavia-squidington-iv added area/platform issues related to the platform area/server area/worker Related to worker labels Dec 28, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Dec 28, 2022

Affected Connector Report

NOTE ⚠️ Changes in this PR affect the following connectors. Make sure to do the following as needed:

  • Run integration tests
  • Bump connector or module version
  • Add changelog
  • Publish the new version

✅ Sources (30)

Connector Version Changelog Publish
source-alloydb 1.0.34
source-alloydb-strict-encrypt 1.0.34 🔵
(ignored)
🔵
(ignored)
source-bigquery 0.2.3
source-clickhouse 0.1.14
source-clickhouse-strict-encrypt 0.1.14 🔵
(ignored)
🔵
(ignored)
source-cockroachdb 0.1.18
source-cockroachdb-strict-encrypt 0.1.18 🔵
(ignored)
🔵
(ignored)
source-db2 0.1.16
source-db2-strict-encrypt 0.1.16 🔵
(ignored)
🔵
(ignored)
source-dynamodb 0.1.0
source-e2e-test 2.1.3
source-e2e-test-cloud 2.1.1 🔵
(ignored)
🔵
(ignored)
source-elasticsearch 0.1.1
source-jdbc 0.3.5 🔵
(ignored)
🔵
(ignored)
source-kafka 0.2.3
source-mongodb-strict-encrypt 0.1.19 🔵
(ignored)
🔵
(ignored)
source-mongodb-v2 0.1.19
source-mssql 0.4.26
source-mssql-strict-encrypt 0.4.26 🔵
(ignored)
🔵
(ignored)
source-mysql 1.0.18
source-mysql-strict-encrypt 1.0.18 🔵
(ignored)
🔵
(ignored)
source-oracle 0.3.21
source-oracle-strict-encrypt 0.3.21 🔵
(ignored)
🔵
(ignored)
source-postgres 1.0.34
source-postgres-strict-encrypt 1.0.34 🔵
(ignored)
🔵
(ignored)
source-redshift 0.3.15
source-scaffold-java-jdbc 0.1.0 🔵
(ignored)
🔵
(ignored)
source-sftp 0.1.2
source-snowflake 0.1.27
source-tidb 0.2.1
  • See "Actionable Items" below for how to resolve warnings and errors.

❌ Destinations (47)

Connector Version Changelog Publish
destination-aws-datalake 0.1.1
destination-azure-blob-storage 0.1.6
destination-bigquery 1.2.9
destination-bigquery-denormalized 1.2.9
destination-cassandra 0.1.4
destination-clickhouse 0.2.1
destination-clickhouse-strict-encrypt 0.2.1 🔵
(ignored)
🔵
(ignored)
destination-csv 0.2.10
destination-databricks 0.3.1
destination-dev-null 0.2.7 🔵
(ignored)
🔵
(ignored)
destination-doris 0.1.0
destination-dynamodb 0.1.7
destination-e2e-test 0.2.4
destination-elasticsearch 0.1.6
destination-elasticsearch-strict-encrypt 0.1.6 🔵
(ignored)
🔵
(ignored)
destination-gcs 0.2.12
destination-iceberg 0.1.0
destination-jdbc 0.3.14 🔵
(ignored)
🔵
(ignored)
destination-kafka 0.1.10
destination-keen 0.2.4
destination-kinesis 0.1.5
destination-local-json 0.2.11
destination-mariadb-columnstore 0.1.7
destination-mongodb 0.1.9
destination-mongodb-strict-encrypt 0.1.9 🔵
(ignored)
🔵
(ignored)
destination-mqtt 0.1.3
destination-mssql 0.1.22
destination-mssql-strict-encrypt 0.1.22 🔵
(ignored)
🔵
(ignored)
destination-mysql 0.1.20
destination-mysql-strict-encrypt 0.1.21
(mismatch: 0.1.20)
🔵
(ignored)
🔵
(ignored)
destination-oracle 0.1.19
destination-oracle-strict-encrypt 0.1.19 🔵
(ignored)
🔵
(ignored)
destination-postgres 0.3.26
destination-postgres-strict-encrypt 0.3.26 🔵
(ignored)
🔵
(ignored)
destination-pubsub 0.2.0
destination-pulsar 0.1.3
destination-r2 0.1.0
destination-redis 0.1.4
destination-redpanda 0.1.0
destination-redshift 0.3.51
destination-rockset 0.1.4
destination-s3 0.3.18
destination-s3-glue 0.1.1
destination-scylla 0.1.3
destination-snowflake 0.4.40
destination-tidb 0.1.0
destination-yugabytedb 0.1.0
  • See "Actionable Items" below for how to resolve warnings and errors.

✅ Other Modules (0)

Actionable Items

(click to expand)

Category Status Actionable Item
Version
mismatch
The version of the connector is different from its normal variant. Please bump the version of the connector.

doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.
Changelog
doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.

changelog missing
There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog.
Publish
not in seed
The connector is not in the seed file (e.g. source_definitions.yaml), so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that it is not a bug.

diff seed version
The connector exists in the seed file, but the latest version is not listed there. This usually means that the latest version is not published. Please use the /publish command to publish the latest version.

@gosusnp gosusnp temporarily deployed to more-secrets December 28, 2022 20:40 — with GitHub Actions Inactive
@gosusnp gosusnp temporarily deployed to more-secrets December 28, 2022 20:40 — with GitHub Actions Inactive
@gosusnp gosusnp temporarily deployed to more-secrets December 28, 2022 23:19 — with GitHub Actions Inactive
@gosusnp gosusnp temporarily deployed to more-secrets December 28, 2022 23:19 — with GitHub Actions Inactive
@gosusnp gosusnp temporarily deployed to more-secrets December 29, 2022 00:01 — with GitHub Actions Inactive
@gosusnp gosusnp temporarily deployed to more-secrets December 29, 2022 00:02 — with GitHub Actions Inactive
@gosusnp
Copy link
Contributor Author

gosusnp commented Dec 29, 2022

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/3797385910
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/3797385910
Python tests coverage:

	 Name                                                 Stmts   Miss  Cover   Missing
	 ----------------------------------------------------------------------------------
	 source_acceptance_test/base.py                          12      4    67%   16-19
	 source_acceptance_test/config.py                       140      5    96%   87, 93, 238, 242-243
	 source_acceptance_test/conftest.py                     208     92    56%   36, 42-44, 49, 54, 77, 83, 89-91, 110, 115-117, 123-125, 131-132, 137-138, 143, 149, 158-167, 173-178, 193, 217, 248, 254, 262-267, 275-280, 288-301, 306-312, 319-330, 337-353
	 source_acceptance_test/plugin.py                        69     25    64%   22-23, 31, 36, 120-140, 144-148
	 source_acceptance_test/tests/test_core.py              402    115    71%   53, 58, 93-104, 109-116, 120-121, 125-126, 308, 346-363, 376-387, 391-396, 402, 435-440, 478-485, 528-530, 533, 598-606, 618-621, 626, 682-683, 689, 692, 728-738, 751-776
	 source_acceptance_test/tests/test_incremental.py       158     14    91%   52-59, 64-77, 240
	 source_acceptance_test/utils/asserts.py                 39      2    95%   62-63
	 source_acceptance_test/utils/common.py                  94     10    89%   16-17, 32-38, 72, 75
	 source_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 source_acceptance_test/utils/connector_runner.py       133     33    75%   24-27, 46-47, 50-54, 57-58, 73-75, 78-80, 83-85, 88-90, 93-95, 124-125, 159-161, 208
	 source_acceptance_test/utils/json_schema_helper.py     107     13    88%   30-31, 38, 41, 65-68, 96, 120, 192-194
	 ----------------------------------------------------------------------------------
	 TOTAL                                                 1603    336    79%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:63: Skipping TestConnection.test_check: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:63: Skipping TestDiscovery.test_discover: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:63: Skipping TestBasicRead.test_read: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:63: Skipping TestFullRefresh.test_sequential_reads: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:63: Skipping TestIncremental.test_two_sequential_reads: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:94: The previous and actual specifications are identical.
================= 13 passed, 6 skipped, 21 warnings in 17.02s ==================

@gosusnp
Copy link
Contributor Author

gosusnp commented Dec 29, 2022

/test connector=connectors/destination-snowflake

@gosusnp gosusnp temporarily deployed to more-secrets December 29, 2022 20:30 — with GitHub Actions Inactive
@gosusnp gosusnp temporarily deployed to more-secrets December 29, 2022 20:31 — with GitHub Actions Inactive
@gosusnp
Copy link
Contributor Author

gosusnp commented Dec 29, 2022

/test connector=connectors/destination-snowflake

🕑 connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/3803161588
✅ connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/3803161588
Python tests coverage:

Name                                                              Stmts   Miss  Cover
-------------------------------------------------------------------------------------
normalization/transform_config/__init__.py                            2      0   100%
normalization/transform_catalog/reserved_keywords.py                 14      0   100%
normalization/transform_catalog/__init__.py                           2      0   100%
normalization/destination_type.py                                    14      0   100%
normalization/__init__.py                                             4      0   100%
normalization/transform_catalog/destination_name_transformer.py     166      8    95%
normalization/transform_catalog/table_name_registry.py              174     34    80%
normalization/transform_config/transform.py                         189     48    75%
normalization/transform_catalog/utils.py                             51     14    73%
normalization/transform_catalog/dbt_macro.py                         22      7    68%
normalization/transform_catalog/catalog_processor.py                147     80    46%
normalization/transform_catalog/transform.py                         61     38    38%
normalization/transform_catalog/stream_processor.py                 595    400    33%
-------------------------------------------------------------------------------------
TOTAL                                                              1441    629    56%

Build Passed

Test summary info:

All Passed

@gosusnp
Copy link
Contributor Author

gosusnp commented Dec 29, 2022

@edgao, @akashkulk, in our previous discussions, I mentioned we should look into using interfaces rather than generated objects directly. This PR is trying to illustrate what this could look like.
The goal would be to have an interface to abstract some details of the generated objects. I took the ConnectorSpecification case here because it has many touch points currently. Let me know what you guys think, it could be worth taking a stab at abstracting the Records to see how it plays with the data types changes.

Copy link
Contributor

@edgao edgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

took a super quick first pass on this - added a few questions before I take a deeper look

import lombok.EqualsAndHashCode;

@EqualsAndHashCode
public class ConnectorSpecificationAdapter implements ConnectorSpecification {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking my understanding: is the idea that we would have an adapter for every protocol version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily, the interface gives us this flexibility, the idea is that the adapter should be the only place where we need bump the objects.

import java.net.URI;
import java.util.List;

@JsonDeserialize(using = ConnectionSpecificationDeserializer.class)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these be declared on the adapters? E.g. if the platform is on protocol v42, but an older connector still uses protocol v41, and it has a call to Jsons.serialize(new ConnectorSpecificationAdapterV41(...))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are needed for jackson to know how to serialize the objects of the model. In this case, it goes on the interface because I declared the interface in the ConnectorJobOutput.yaml rather than the implementation.

import lombok.EqualsAndHashCode;

@EqualsAndHashCode
public class ConnectorSpecificationAdapter implements ConnectorSpecification {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, how sketchy is it to have the generated protocol models directly implement the interfaces? I.e.

  • interface declares default Whatever getWhatever() { return null; }
  • and the generated classes look like public class AirbyteMessage implements io.airbyte.commons.protocol.objects.AirbyteMessage

then:

  • adding new fields is still easy (just add a new default method to the interface)
  • deleting fields is still easy (mark the interface method as deprecated, or just delete it entirely)
  • renaming fields is... probably fine? (default Whatever getWhatever() { return getOldWhatever(); })

(I forget if the issue was that there's a technical problem, or that the code generator just doesn't support this - if the latter, we could maaaaybe do something similar to https://github.com/LiveRamp/extravagance/ and edit the generated code)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One objective is to decouple the generated code from the how we use it.
For example, we may decide to rename field in the protocol, by decoupling this, we can control if/when we want to propagate this. Typical reason could be to limit the amount of changes at the same time.
When it comes to deprecation, newer version could remove some fields, those fields could remain in the interface, but we would have the option to throw exceptions depending on the adapter implementation.

@Override
public void serialize(ConnectorSpecification value, JsonGenerator gen, SerializerProvider provider) throws IOException {
gen.writeStartObject();
gen.writeStringField("object", value.toJson());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which components would receive these {object: {spec...}} messages? I.e. do connectors need to produce this format? Or are connectors still expected to produce protocol messages, and then the worker would translate them into this format? (or something else?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think those should remain within the platform, those objects should be mostly used for temporal payloads which is generated by the clients we use and the activity. It currently leaks to the connector's tests because some tests rely on the workers.
Connectors having to generate those should be a red flag.

@davinchia
Copy link
Contributor

Jimmy, going to close this for now. Feel free to move this on to airbyte-platform if it's relevant!

@davinchia davinchia closed this May 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform issues related to the platform area/server area/worker Related to worker
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants