Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable low-code CDK users to specify schema in the manifest #20375

Merged
merged 8 commits into from
Dec 13, 2022

Conversation

clnoll
Copy link
Contributor

@clnoll clnoll commented Dec 12, 2022

What

Allows users of the low-code CDK to define stream schemas inline in the manifest.

How

Adds a new StaticSchemaLoader to the list of schema loader options. When a user specifies that the StaticSchemaLoader should be used with a stream, they must also supply the schema property.

  applications_stream:
    $ref: "*ref(definitions.base_incremental_stream)"
    $options:
      name: "applications"
      path: "applications"
    stream_cursor_field: "applied_at"
    schema_loader:
      type: "StaticSchemaLoader"
      schema: {...}

Recommended reading order

  1. airbyte-integrations/connectors/source-greenhouse/source_greenhouse/greenhouse.yaml
  2. airbyte-cdk/python/airbyte_cdk/sources/declarative/schema/static_schema_loader.py
  3. airbyte-cdk/python/airbyte_cdk/sources/declarative/config_component_schema.json
    airbyte-cdk/python/airbyte_cdk/sources/declarative/parsers/class_types_registry.py
    airbyte-cdk/python/airbyte_cdk/sources/declarative/schema/__init__.py
    airbyte-cdk/python/unit_tests/sources/declarative/schema/test_static_schema_loader.py

🚨 User Impact 🚨

No breaking changes; the output of discovery is exactly the same with inline schema versus on-disk schema.

Pre-merge Checklist

  • Unit & integration tests added and passing
  • Code reviews completed
  • Documentation updated
  • PR name follows PR naming conventions
  • Issue acceptance criteria met

Tests

Unit

Put your unit tests output here.

Integration

Put your integration tests output here.

Acceptance

Put your acceptance tests output here.

@clnoll clnoll requested a review from a team as a code owner December 12, 2022 17:17
@octavia-squidington-iv octavia-squidington-iv added the CDK Connector Development Kit label Dec 12, 2022
@clnoll clnoll marked this pull request as draft December 12, 2022 17:18
@clnoll clnoll force-pushed the write-stream-schemas-to-manifest branch from 8d86dc7 to d4bb23c Compare December 12, 2022 21:21
Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! Anything in particular you'd like feedback on?

@clnoll
Copy link
Contributor Author

clnoll commented Dec 13, 2022

Thanks @sherifnada - is there documentation that I should update as part of this PR?

@clnoll clnoll force-pushed the write-stream-schemas-to-manifest branch from f3d0a00 to 71782a8 Compare December 13, 2022 00:16
@octavia-squidington-iv octavia-squidington-iv removed the area/connectors Connector related issues label Dec 13, 2022
@girarda
Copy link
Contributor

girarda commented Dec 13, 2022

@clnoll updating the documentation is a good idea!

We should update the doc that describes how to define the schema and the tutorial

Copy link
Contributor

@girarda girarda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! Approved pending bumping the CDK version in setup.py an updating the changelog

@clnoll clnoll changed the title WIP: Enable low-code CDK users to specify schema in the manifest Enable low-code CDK users to specify schema in the manifest Dec 13, 2022
Copy link
Contributor

@brianjlai brianjlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thought to discuss about the name of the component, but the code itself looks good!

I think another interesting discussion might be how we want to organize the schemas now that we can define them in the manifest itself. I saw in an earlier draft that it was in definitions which technically works because we can reference any part of the YAML. But it starts to make the manifest pretty unreadable because schemas will eventually take up a bulk of the file and developers will have to either do searches or just scroll around a ton. I think we might want to add in another top level field in the manifest for schemas at the bottom of manifest and we can encourage the convention to only add them there. And then all the various parts don't get split up by a massive blob of schema yaml.

It goes a little bit outside the scope of the work you're doing, but just something I thought of while reviewing this.



@dataclass
class StaticSchemaLoader(SchemaLoader):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I'm thinking about is whether StaticSchemaLoader is the right name for this component. Static feels like too vague a term because technically all our connector's that reference some underlying example_schema.json are also static, just not living within the manifest file itself.

InlineSchemaLoader(SchemaLoader) might be a good alternative since what we want to highlight from the component is that its defined within the manifest, not that its static.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestions, @brianjlai - I like InlineSchemaLoader and will update the name in the relevant places.

I'll also update the example inline schemas in #20405 to use a top-level schemas key. I believe that will require a small change to this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed the suggested changes to this PR and #20405.

@clnoll clnoll force-pushed the write-stream-schemas-to-manifest branch from 5bbfe37 to 625d612 Compare December 13, 2022 01:05
@clnoll clnoll marked this pull request as ready for review December 13, 2022 01:37
@octavia-squidington-iv octavia-squidington-iv added the area/documentation Improvements or additions to documentation label Dec 13, 2022
@clnoll
Copy link
Contributor Author

clnoll commented Dec 13, 2022

/publish-cdk dry-run=true

https://github.com/airbytehq/airbyte/actions/runs/3682185176

@clnoll
Copy link
Contributor Author

clnoll commented Dec 13, 2022

/publish-cdk dry-run=true

🕑 https://github.com/airbytehq/airbyte/actions/runs/3682260258
https://github.com/airbytehq/airbyte/actions/runs/3682260258

@clnoll
Copy link
Contributor Author

clnoll commented Dec 13, 2022

/publish-cdk dry-run=false

🕑 https://github.com/airbytehq/airbyte/actions/runs/3682293797
https://github.com/airbytehq/airbyte/actions/runs/3682293797

@clnoll clnoll merged commit 9dae098 into master Dec 13, 2022
@clnoll clnoll deleted the write-stream-schemas-to-manifest branch December 13, 2022 04:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/documentation Improvements or additions to documentation CDK Connector Development Kit connectors/source/gnews
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants