Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a config for instance wide oauth parameters #5761

Merged
merged 16 commits into from
Sep 9, 2021

Conversation

ChristopheDuong
Copy link
Contributor

@ChristopheDuong ChristopheDuong commented Aug 31, 2021

What

Closes #5637
Closes #5638

How

Describe the solution

  • Introduce new airbyte configs in the ConfigSchema enum: SourceOAuthParameter & DestinationOAuthParameter to be persisted in the ConfigRepository
  • These configs are retrieved by the Scheduler through the DefaultSyncJobFactory that reconstructs all the configuration data necessary for a sync. When OAuth parameters are necessary for certain connectors, they are injected into the connector configurations before being scheduled/submitted to workers.
  • Additionally, this PR also cleans up web backend endpoints that are now unused (source & destination recreate)

Recommended reading order

  1. airbyte-scheduler/persistence/src/main/java/io/airbyte/scheduler/persistence/job_factory/DefaultSyncJobFactory.java
  2. airbyte-scheduler/persistence/src/main/java/io/airbyte/scheduler/persistence/job_factory/OAuthConfigSupplier.java
  3. the rest

Pre-merge Checklist

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • docs/SUMMARY.md
    • docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
    • docs/integrations/README.md
    • airbyte-integrations/builds.md
  • PR name follows PR naming conventions
  • Connector added to connector index like described here

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions
  • Connector version bumped like described here

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here

Connector Generator

  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
  • Documentation which references the generator is updated as needed.

@github-actions github-actions bot added area/api Related to the api area/platform issues related to the platform area/documentation Improvements or additions to documentation labels Aug 31, 2021
Base automatically changed from chris/oauth-request-api to master August 31, 2021 22:21
@ChristopheDuong ChristopheDuong marked this pull request as ready for review September 2, 2021 10:51

private static void injectJsonNode(ObjectNode config, ObjectNode fromConfig) {
for (String key : Jsons.keys(fromConfig)) {
config.set(key, fromConfig.get(key));
Copy link
Contributor Author

@ChristopheDuong ChristopheDuong Sep 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the OAuth parameter was already part of the connector config, should we still inject and overwrite it with the one from the instance-wide values?

or should there be a special behavior here? (adding only if does not exists yet)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. Good question. I think if any of the parameters is present in the config we shouldn't add it. It may end up being integration specific. For now I think the right way to handle it is don't set anything if any of the oauth config keys are present in the config

@jrhizor
Copy link
Contributor

jrhizor commented Sep 2, 2021

Will similar injection to what is happening in DefaultSyncJobFactory be added to check/discover in this PR or a separate one?

Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question as jared about discover/check

Also, I think having seen the impl here I think it's better if we maintain the invariant that configs are always fully formed but instead of having the full secrets they would have coordinates to the secrets in a secret store.

Also, we should either disable custom DBT transformations on cloud or just not support oauth for destinations until we do

- configuration
additionalProperties: false
properties:
oauthParameterId:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this param for? could you add a description field? if this is just a surrogate key maybe id is a better name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's the primary key for the OAuth parameter object.

I don't think we have any objects that define id as their primary key column though?

  • SourceConnection is sourceId
  • DestinationConnection is destinationId
  • StandardSync is connectionId

That's why I named it that way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm let's stick with the convention 👍🏼


private static void injectJsonNode(ObjectNode config, ObjectNode fromConfig) {
for (String key : Jsons.keys(fromConfig)) {
config.set(key, fromConfig.get(key));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. Good question. I think if any of the parameters is present in the config we shouldn't add it. It may end up being integration specific. For now I think the right way to handle it is don't set anything if any of the oauth config keys are present in the config

final DestinationConnection destinationConnection = configRepository.getDestinationConnection(standardSync.getDestinationId());

destinationConnection.withConfiguration(oAuthConfigSupplier.injectDestinationOAuthParameters(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is dangerous because the secrets could leak into custom dbt images. We should not pass this.

Copy link
Contributor Author

@ChristopheDuong ChristopheDuong Sep 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're thinking about reworking how to pass the config files to custom dbt transformation containers.

So the destination_config.json (used by destination connectors) should not be forwarded to the custom dbt image (only a translated profiles.yml should, thus this file should therefore omit OAuth params)

#5091

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -29,6 +29,8 @@ function configuredbt() {
echo "Running: transform-config --config ${CONFIG_FILE} --integration-type ${INTEGRATION_TYPE} --out ${PROJECT_DIR}"
# Generate a profiles.yml file for the selected destination/integration type
transform-config --config "${CONFIG_FILE}" --integration-type "${INTEGRATION_TYPE}" --out "${PROJECT_DIR}"
# Remove config file as it might still contain sensitive credentials
rm "${CONFIG_FILE}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normalization image is used by custom operations to translate a destination_config.json into a profiles.yml file.

We could make sure to remove the config file once we're done translating it

Copy link
Contributor

@sherifnada sherifnada Sep 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm not totally clear on why this is needed since it's running a trusted image? or is this re-used for custom operations?

Copy link
Contributor Author

@ChristopheDuong ChristopheDuong Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm not totally clear on why this is needed since it's running a trusted image? or is this re-used for custom operations?

Yes,the normalization image is trusted to generate dbt's profiles.yml (from a destination_config.json file).
The resulting profiles.yml file is then transferred to run custom DBT operations in the user's specified docker image (so the latter one never needs to read/parse a destination_config.json)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM - could you add a source code comment explaining why that file containing sensitive credentials is a bad thing and therefore it should be removed?

Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this LGTM but would like @jrhizor 's input on whether to put the oauth secrets in the jobs DB directly vs. reading them at the moment we're about to run a connector

- configuration
additionalProperties: false
properties:
oauthParameterId:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm let's stick with the convention 👍🏼

@@ -29,6 +29,8 @@ function configuredbt() {
echo "Running: transform-config --config ${CONFIG_FILE} --integration-type ${INTEGRATION_TYPE} --out ${PROJECT_DIR}"
# Generate a profiles.yml file for the selected destination/integration type
transform-config --config "${CONFIG_FILE}" --integration-type "${INTEGRATION_TYPE}" --out "${PROJECT_DIR}"
# Remove config file as it might still contain sensitive credentials
rm "${CONFIG_FILE}"
Copy link
Contributor

@sherifnada sherifnada Sep 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm not totally clear on why this is needed since it's running a trusted image? or is this re-used for custom operations?

@@ -109,15 +126,23 @@ public long createGetSpecJob(String integrationImage) throws IOException {
String destinationDockerImageName,
List<StandardSyncOperation> standardSyncOperations)
throws IOException {
final JsonNode sourceConfiguration = oAuthConfigSupplier.injectSourceOAuthParameters(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrhizor is this the right place to be injecting the oauth secrets i.e: enqueue them directly in the jobs DB?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably fine for now but it will change in the near future. This is what I'm working on eliminating this week, still working out the details of what it will look like.

Most likely we will push the combined version to config sent to Temporal.

Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think is the best way to prevent this from leaking into DBT images? I created this PR but should we also add something in the worker? ignore if it's not answered by the other question i asked about reusing normalization entrypoint

}

@Override
public long createSourceCheckConnectionJob(SourceConnection source, String dockerImageName) throws IOException {
final JsonNode sourceConfiguration = oAuthConfigSupplier.injectSourceOAuthParameters(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The combining for get spec / check / discover can't happen in the JobCreator since the jobs are created directly in the DefaultSynchronousSchedulerClient. We're removing these misleading and unused functions in #5896

Copy link
Contributor Author

@ChristopheDuong ChristopheDuong Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok, so I am reverting back to injecting OAuth parameters in the DefaultSyncJobFactory as well as handling it in DefaultSynchronousSchedulerClient for check/discover. (probably not needed for get spec)

@@ -109,15 +126,23 @@ public long createGetSpecJob(String integrationImage) throws IOException {
String destinationDockerImageName,
List<StandardSyncOperation> standardSyncOperations)
throws IOException {
final JsonNode sourceConfiguration = oAuthConfigSupplier.injectSourceOAuthParameters(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably fine for now but it will change in the near future. This is what I'm working on eliminating this week, still working out the details of what it will look like.

Most likely we will push the combined version to config sent to Temporal.

@@ -211,6 +213,33 @@ public void writeStandardSyncOperation(final StandardSyncOperation standardSyncO
return persistence.listConfigs(ConfigSchema.STANDARD_SYNC_OPERATION, StandardSyncOperation.class);
}

public SourceOAuthParameter getSourceOAuthParams(final UUID SourceOAuthParameterId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to need to overwrite these calls to use the secrets persistence @airbyte-jenny


private static void injectJsonNode(ObjectNode config, ObjectNode fromConfig) {
for (String key : Jsons.keys(fromConfig)) {
if (!config.has(key) || isSecretMask(config.get(key).asText())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When are we inserting that the secrets mask is necessary to be overwritten?

Copy link
Contributor Author

@ChristopheDuong ChristopheDuong Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm proposing to persist in the connector's configuration with injected and masked OAuth parameters in this PR: #5865

This would be done by web backend endpoints of the API when creating a connector.

The motivation is to have a full connector configuration saved by the UI, so JSON/spec validation can verify that the connector's configuration (optionally in combination with injected OAuth params) is valid.

@github-actions github-actions bot added area/connectors Connector related issues and removed area/connectors Connector related issues labels Sep 8, 2021
@@ -29,6 +29,8 @@ function configuredbt() {
echo "Running: transform-config --config ${CONFIG_FILE} --integration-type ${INTEGRATION_TYPE} --out ${PROJECT_DIR}"
# Generate a profiles.yml file for the selected destination/integration type
transform-config --config "${CONFIG_FILE}" --integration-type "${INTEGRATION_TYPE}" --out "${PROJECT_DIR}"
# Remove config file as it might still contain sensitive credentials
rm "${CONFIG_FILE}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM - could you add a source code comment explaining why that file containing sensitive credentials is a bad thing and therefore it should be removed?

sherifnada and others added 3 commits September 9, 2021 00:15
* Inject masked OAuth params for UI Validation

* Add tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Related to the api area/documentation Improvements or additions to documentation area/platform issues related to the platform normalization
Projects
None yet
3 participants