Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Step 2: Add Namespace to Airbyte Protocol #2228

Closed
wants to merge 5 commits into from

Conversation

ChristopheDuong
Copy link
Contributor

@ChristopheDuong ChristopheDuong commented Feb 26, 2021

What

Describe what the change is solving
Same as #1993 to implement #1921

How

Describe the solution

This introduces a new object AirbyteStreamName in both the API and the Airbyte Protocol to represent namespace and table name in a single struct.
The old name of AirbyteStream is kept as it was but the new fields are properly populated with separate namespace/table names.

So for example, we would have catalog produced by the JDBC sources:

  • Stream name: public.users (like before)
  • AirbyteStreamName.namespace: public (new)
  • AirbyteStreamName.name: users (new)

Non-jdbc sources are now producing:

  • Stream name: users (like before)
  • AirbyteStreamName.namespace: (empty)
  • AirbyteStreamName.name: users (same as stream name)

These new fields won't be used right away so they shouldn't introduce any change to the product until the last step is implemented.

Pre-merge Checklist

  • Run integration tests
  • Publish Docker images
  • Verify Acceptance tests (depends on new docker images)

Recommended reading order

  1. airbyte-api/src/main/openapi/config.yaml
  2. airbyte-protocol/models/src/main/resources/airbyte_protocol/airbyte_protocol.yaml
  3. airbyte-server/src/main/java/io/airbyte/server/converters/CatalogConverter.java
  4. airbyte-integrations/connectors/source-jdbc/src/main/java/io/airbyte/integrations/source/jdbc/AbstractJdbcSource.java
  5. the rest

@ChristopheDuong
Copy link
Contributor Author

ChristopheDuong commented Feb 26, 2021

When we make changes to the Airbyte Protocol, old source connectors are incompatible with newer versions of the protocol:

If I run a connector with a version from before my change to the protocol where I added the stream_name field, the validation of the Catalog from the old connector code will fail:

Caused by: com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "stream_name" (class io.airbyte.protocol.models.AirbyteStream), not marked as ignorable (5 known properties: "json_schema", "name", "supported_sync_modes", "default_cursor_field", "source_defined_cursor"]) 

logs-7-0.txt

How should I deal with this?

  1. Make jsons validations in connectors a little more permissive on unknown properties. Then force bump/upgrade all source connectors via a migration?
  2. Strip/back-migrate catalogs back to the proper protocol version for each connector

@ChristopheDuong ChristopheDuong changed the title Add Namespace to Airbyte Protocol Step 1/3: Add Namespace to Airbyte Protocol Feb 26, 2021
Copy link
Contributor

@michel-tricot michel-tricot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really cool!

@@ -34,9 +34,34 @@
*/
public class CatalogConverter {

private static io.airbyte.api.model.AirbyteStreamName toApi(final String name, final io.airbyte.protocol.models.AirbyteStreamName streamName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

streamname should be the first parameter since it is the one that will remain.

Can you add a Preconditions that checks that only one of the two is set? (assuming this is correct)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the moment both are set, sometimes the StreamName object could be null (if it was serialized before introducing the namespace field)

@@ -281,6 +283,7 @@ public void testDiscoverSourceSchema() throws ApiException {
.build());
final AirbyteStream stream = new AirbyteStream()
.name(STREAM_NAME)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we unset this?

Copy link
Contributor Author

@ChristopheDuong ChristopheDuong Mar 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is required as we are currently backward compatible and everything keeps working as it was before.

The new AirbyteStreamName field is not actually influencing the stream names for the moment

I created this issue to remove such methods usage here though: #2240

@ChristopheDuong ChristopheDuong marked this pull request as ready for review March 2, 2021 19:57
Copy link
Contributor

@michel-tricot michel-tricot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! can you also get a review from someone else in the team before you merge?

private static final String STREAM_NAME = "dbo.id_and_name";
private static final String STREAM_NAMESPACE = "dbo";
private static final String TABLE_NAME = "id_and_name";
private static final String STREAM_NAME = STREAM_NAMESPACE + "." + TABLE_NAME;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the stream name just be tablename?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous STREAM_NAME was "dbo.id_and_name"

We didn't introduce big changes on how things work so the stream name is still the same as before (but now it is also just "parsed" and split in two in the new AirbyteStreamName struct)

@@ -79,10 +123,15 @@
final List<io.airbyte.api.model.AirbyteStreamAndConfiguration> streams = catalog.getStreams()
.stream()
.map(configuredStream -> {
final String streamName;
if (configuredStream.getStream().getStreamName() != null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT You're missing {

@@ -79,10 +123,15 @@
final List<io.airbyte.api.model.AirbyteStreamAndConfiguration> streams = catalog.getStreams()
.stream()
.map(configuredStream -> {
final String streamName;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this variable name is super confusing. I think you should just call it name

Copy link
Contributor

@cgardens cgardens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like it should work okay (as long as all of the tests for the jdbc sources are passing).

I am not convinced that adding the AirbyteStreamName adds anything over just adding a namespace field. It seems like the extra object is kinda a headache. I think it would also be easier from a backwards compatibility point of view.

airbyte-api/src/main/openapi/config.yaml Outdated Show resolved Hide resolved
@@ -127,6 +127,9 @@ definitions:
properties:
name:
type: string
description: Stream's name (deprecated, will be replaced by stream_name).
stream_name:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be simpler if instead of adding this AirbyteStreamName object, you just added namespace as a property to this top level object? then we don't need to deprecate name? it also saves us having to ever have name and stream_name at the same time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i got to think it makes the backwards compatibility story way simpler too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my last PR, @michel-tricot was mentioning this: #1993 (comment)

Did I misunderstand his comment?

Copy link
Contributor

@cgardens cgardens Mar 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kk. i don't think you misunderstood. up to you.

@@ -134,7 +134,9 @@ public AirbyteCatalog discover(JsonNode config) throws Exception {
Optional.ofNullable(config.get("database")).map(JsonNode::asText),
Optional.ofNullable(config.get("schema")).map(JsonNode::asText))
.stream()
.map(t -> CatalogHelpers.createAirbyteStream(t.getName(), t.getFields())
.map(t -> CatalogHelpers.createAirbyteStream(CatalogHelpers.createAirbyteStreamName(t.getSchemaName(), t.getName()), t.getFields())
// TODO: Switch fully to StreamName instead of temporarily setName() for backward compatibility
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add your github user name, or something else identifying when you add todos?

private final String name;
private final List<Field> fields;

public TableInfo(String name, List<Field> fields) {
public TableInfo(String schemaName, String name, List<Field> fields) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this all work okay with MySQL (which doesn't have schemas)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we already figured it out when we made the PR to discover tables from another schema than just the public one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kk

@cgardens
Copy link
Contributor

cgardens commented Mar 4, 2021

what i'm describing won't work, because you're changing behavior in the sources before the destinations are compatible with namespaces so you're a little boxed. if you were just doing destinations first, i think you could the in place replacement (i.e. just add namespace). in that process each destination would be made to support namespaces first when they were present. each one could be migrated. once that was done, we could start migrating sources, and as they had non null values for namespaces the destinations would respect them). i don't think i love having this extra object, but it would have to be a pretty execution plan to get around it.

probably fine to keep it as is, but wanted to at least offer this different perspective.

Co-authored-by: Charles <giardina.charles@gmail.com>
@ChristopheDuong ChristopheDuong changed the title Step 1/3: Add Namespace to Airbyte Protocol Step 2: Add Namespace to Airbyte Protocol Mar 5, 2021
@swyxio swyxio deleted the chris/namespace-protocol branch October 11, 2022 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants