Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDK: Emit control message on config mutation #19428

Merged
merged 30 commits into from
Nov 29, 2022

Conversation

alafanechere
Copy link
Contributor

@alafanechere alafanechere commented Nov 15, 2022

What

Closes #17910
Quoting @davinchia in #3990:

Today we do not handle the case where a new refresh token is returned when a new access token is granted.

A new protocol message: AirbyteControlMessage.ConnectorConfig was introduced to enable the connector to share a configuration update with the platform.

This PR is an attempt to update the CDK to enable automatic emission of AirbyteControlMessage.ConnectorConfig on mutation of a config dict keys, and nested keys.

How

  1. Add a config kwarg to OAuth2Authenticator.__init__. Observe this config by creating an ObservedDict assigned to self.connector_config
  2. ObservedDict get created with a ConfigObserver object. ObservedDict call ConfigObserver.update on __set_item__ calls (this is the function called on key assignment: e.g config["refresh_token"] = "new_refresh_token".
  3. ConfigObserver.update emits an AirbyteControlMessage.ConnectorConfig with the updated config.

N.B.

I chose the observer pattern because I thought this could be a silver-bullet-like approach to enable additional future use cases around AirbyteControlMessage.ConnectorConfig message.

The platform is not currently processing these new messages, so emitting will AirbyteControlMessage.ConnectorConfig will currently have no effect/

Recommended reading order

🚨 User Impact 🚨

To benefit from config observation, the connector developer can implement a custom authenticator inheriting from OAuth2Authenticator and create any token-refresh logic in this authenticator. The functions implementing this logic will have to mutate self.connector_config to emit a control message.

@octavia-squidington-iv octavia-squidington-iv added the CDK Connector Development Kit label Nov 15, 2022
@alafanechere alafanechere marked this pull request as ready for review November 15, 2022 15:04
@alafanechere alafanechere requested a review from a team as a code owner November 15, 2022 15:04
@alafanechere alafanechere changed the title CDK: Emit control message when config is mutated CDK: Emit control message on config mutation Nov 15, 2022
@alafanechere alafanechere requested review from evantahler and a team November 15, 2022 15:05
Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alafanechere could you show an example of how would this be used in the context of an HTTPStream and a single-use oauth authenticator?

@alafanechere
Copy link
Contributor Author

@sherifnada I achieved a working example for BingAds in 733b89a (will revert afterward)

A READ outputs the following to stdout (I manually removed sensitive fields):

$ python main.py read --catalog integration_tests/configured_catalog.json --config secrets/config.json
{"type": "LOG", "log": {"level": "INFO", "message": "Starting syncing SourceBingAds"}}
{"type": "LOG", "log": {"level": "INFO", "message": "Fetching access token ..."}}
{"type": "CONNECTOR_CONFIG", "emitted_at": 1668613492681.509, "connectorConfig": {"config": {"tenant_id": "common", "refresh_token": "<OBFUSCATED-FOR-DEMO>", "client_id": "<OBFUSCATED-FOR-DEMO>", "developer_token": "<OBFUSCATED-FOR-DEMO>", "reports_start_date": "2020-02-25", "hourly_reports": false, "daily_reports": false, "weekly_reports": true, "monthly_reports": true}}}
{"type": "LOG", "log": {"level": "INFO", "message": "Syncing stream: ad_groups "}}
... Records are coming in afterward

@evantahler
Copy link
Contributor

evantahler commented Nov 16, 2022

{"type": "CONNECTOR_CONFIG", "emitted_at": 123, "connectorConfig": {...}}

I think there's one more envelope layer on that message, and it should be something like:

{"type": "CONTROL", "control": {"type": "CONNECTOR_CONFIG", "emitted_at": 123, "connectorConfig": {...}}}

@alafanechere
Copy link
Contributor Author

{"type": "CONNECTOR_CONFIG", "emitted_at": 123, "connectorConfig": {...}}

I think there's one more envelope layer on that message, and it should be something like:

{"type": "CONTROL", "control": {"type": "CONNECTOR_CONFIG", "emitted_at": 123, "connectorConfig": {...}}}

I think I forgot to wrap this around an AirbyteMessage right?

@evantahler
Copy link
Contributor

evantahler commented Nov 16, 2022

I think I forgot to wrap this around an AirbyteMessage right?

That's probably it!

@alafanechere alafanechere force-pushed the augustin/cdk/emit-updated-configs branch from f5c3d0c to 8fd52f9 Compare November 23, 2022 15:45
@alafanechere
Copy link
Contributor Author

@sherifnada I tried to go in the direction you suggested by implementing a specific SingleUseRefreshTokenOauth2Authenticator authenticator with a behavior that can match most existing OAuth scenarios. Let me know if it's what you had in mind 😄

@alafanechere
Copy link
Contributor Author

I tried to use source-harvest as an example: c5a0b6d

Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! looking really good. Some questions

ValueError: Raised if the defined getters are not returning a value.
"""
for field_name, getter in [
("client_id", self.get_client_id),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we care about retaining and validating client_id and client_secret in this class instead of just passing up the concern to the super?

Copy link
Contributor Author

@alafanechere alafanechere Nov 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As client_id, client_secret, and refresh_token are mandatory arguments of the parent class, and the current constructor signature of the new class only takes a connector_config dict, I thought it would be safer to fail early if these fields' values could not be parsed from the config.

token_refresh_endpoint="https://id.getharvest.com/api/v2/oauth2/token",
client_id=credentials.get("client_id"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why were these removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The client_id parameter does not exist in my SingleUseRefreshTokenOauth2Authenticator implementation. The client_id is retrieved from the configuration with dpath

try:
response_json = self._get_refresh_access_token_response()
return (
response_json[self.get_access_token_name()],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would recommend using dpath

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed but for implementation consistency with parent class and abstract class, I'd prefer to do this in a separate PR in which we can replace the access_token_name, expires_in_name, and refresh_token_name by dpaths that can be used when parsing these responses. Wdyt?

Copy link
Contributor

@sherifnada sherifnada Nov 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense!

@alafanechere
Copy link
Contributor Author

alafanechere commented Nov 25, 2022

@sherifnada I used dpath to declare path to the config field that we use to build refresh requests and set a new refresh token values in the config. I think this is far better now for DX 😄
I think dpath could also be used to define with more flexibility the location of the fields we want to parse in the refresh token response, but I'd prefer to keep this refacto for a different PR.

Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff! requesting a couple small changes but no extra review needed from my side. Thank you for making this awesome change.

@octavia-squidington-iv octavia-squidington-iv removed the area/connectors Connector related issues label Nov 29, 2022
@alafanechere
Copy link
Contributor Author

alafanechere commented Nov 29, 2022

/publish-cdk dry-run=true

🕑 https://github.com/airbytehq/airbyte/actions/runs/3574511972
https://github.com/airbytehq/airbyte/actions/runs/3574511972

@alafanechere
Copy link
Contributor Author

alafanechere commented Nov 29, 2022

/publish-cdk dry-run=false

🕑 https://github.com/airbytehq/airbyte/actions/runs/3574606697
https://github.com/airbytehq/airbyte/actions/runs/3574606697

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python CDK Updated to use and emit updated configurations
5 participants