Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐞 Destination databricks: update jdbc driver to patch log4j #7622

Merged
merged 15 commits into from
Jan 7, 2022

Conversation

tuliren
Copy link
Contributor

@tuliren tuliren commented Nov 4, 2021

What

@tuliren
Copy link
Contributor Author

tuliren commented Nov 4, 2021

/test connector=connectors/destination-databricks

🕑 connectors/destination-databricks https://github.com/airbytehq/airbyte/actions/runs/1420186884
❌ connectors/destination-databricks https://github.com/airbytehq/airbyte/actions/runs/1420186884
🐛

@jrhizor jrhizor temporarily deployed to more-secrets November 4, 2021 07:04 Inactive
@tuliren tuliren temporarily deployed to more-secrets November 4, 2021 07:55 Inactive
@tuliren
Copy link
Contributor Author

tuliren commented Nov 4, 2021

/test connector=connectors/destination-databricks

🕑 connectors/destination-databricks https://github.com/airbytehq/airbyte/actions/runs/1420333982
❌ connectors/destination-databricks https://github.com/airbytehq/airbyte/actions/runs/1420333982
🐛

@tuliren tuliren temporarily deployed to more-secrets November 4, 2021 08:00 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets November 4, 2021 08:01 Inactive
@tuliren
Copy link
Contributor Author

tuliren commented Nov 4, 2021

/test connector=connectors/destination-databricks

🕑 connectors/destination-databricks https://github.com/airbytehq/airbyte/actions/runs/1420358298
❌ connectors/destination-databricks https://github.com/airbytehq/airbyte/actions/runs/1420358298
🐛

@tuliren tuliren temporarily deployed to more-secrets November 4, 2021 08:07 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets November 4, 2021 08:08 Inactive
@tuliren
Copy link
Contributor Author

tuliren commented Nov 4, 2021

/test connector=connectors/destination-databricks

🕑 connectors/destination-databricks https://github.com/airbytehq/airbyte/actions/runs/1420386204
✅ connectors/destination-databricks https://github.com/airbytehq/airbyte/actions/runs/1420386204
No Python unittests run

@jrhizor jrhizor temporarily deployed to more-secrets November 4, 2021 08:18 Inactive
@tuliren tuliren temporarily deployed to more-secrets November 4, 2021 08:33 Inactive
@tuliren
Copy link
Contributor Author

tuliren commented Nov 4, 2021

@davinchia, I manually created an io-airbyte-build-dependencies bucket in GCS. Should I do it through some Terraform script instead?

@davinchia
Copy link
Contributor

@tuliren yes we should. You can either do it on your own, or create a ticket for the infrastructure team to help you with it.

Have you seen https://docs.google.com/document/d/1Xqr9fC1toPlMoj74jb9e5FaTMjmsBIFd9BY3c3Zv5qc/edit#bookmark=id.ik4m6mwqpw5v on how to work with the infrastructure team?

@davinchia
Copy link
Contributor

Question from me:

  • Did you do anything special to grant the spec service account permissions to this bucket?
  • Maybe this is a dumb question - can we put the JDBC driver into our CloudRepo Maven repository instead? This way all our public java artifacts are in one place and we don't have an additional bucket to manage.

@tuliren
Copy link
Contributor Author

tuliren commented Nov 4, 2021

Did you do anything special to grant the spec service account permissions to this bucket?

I granted this service account the read permission to this bucket on the permission page on GCS.

Can we put the JDBC driver into our CloudRepo Maven repository instead? This way all our public java artifacts are in one place and we don't have an additional bucket to manage.

  • Where is this CloudRepo Maven repository?
  • What do you mean by "public"? This jar cannot be publicly downloadable by anyone from the internet. When downloading it, people need to accept to a certain terms & conditions.
  • The jar is publicly available here. I just don't know if we can download directly from that link programmatically. So if we can put it somewhere private, we won't have the potential legal issues.

@davinchia
Copy link
Contributor

CloudRepo is the Java hosting service we are using: cloudrepo.io. Login credentials are in Lastpass under cloudrepo.io in the shared-engineering folder. This is where java artifacts from OSS are pushed to.

Got it. I didn't realise we want to keep this private. If you aren't rushing, I think I would prefer creating a private repository in CloudRepo and set things up so we publish and pull from there. Main thing here is to keep all the java artifacts in one place. I can take a stab at this on Monday/Tuesday if you aren't free.

What do you think?

@tuliren
Copy link
Contributor Author

tuliren commented Nov 5, 2021

Got it. Thanks. I think I will just change it to download from Databricks each time. That does not seem to be breaking the terms & conditions. It's not worth the time to be too careful. But thank you all the same.

@davinchia
Copy link
Contributor

Okay! Happy to relook when you make those changes. Thanks Liren.

@tuliren tuliren force-pushed the liren/auto-download-spark-jdbc branch from 736c0d6 to 76315db Compare January 4, 2022 19:00
@tuliren tuliren temporarily deployed to more-secrets January 4, 2022 19:02 Inactive
@github-actions github-actions bot added the area/connectors Connector related issues label Jan 4, 2022
@tuliren
Copy link
Contributor Author

tuliren commented Jan 4, 2022

/test connector=connectors/destination-databricks

🕑 connectors/destination-databricks https://github.com/airbytehq/airbyte/actions/runs/1656038661
❌ connectors/destination-databricks https://github.com/airbytehq/airbyte/actions/runs/1656038661
🐛

@tuliren tuliren temporarily deployed to more-secrets January 4, 2022 23:39 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets January 4, 2022 23:40 Inactive
@tuliren
Copy link
Contributor Author

tuliren commented Jan 4, 2022

/test connector=connectors/destination-databricks

🕑 connectors/destination-databricks https://github.com/airbytehq/airbyte/actions/runs/1656069372
❌ connectors/destination-databricks https://github.com/airbytehq/airbyte/actions/runs/1656069372
🐛

@tuliren tuliren temporarily deployed to more-secrets January 4, 2022 23:50 Inactive
@@ -77,7 +77,7 @@ public StandardCheckConnectionOutput run(final StandardCheckConnectionInput inpu
LOGGER.debug("Check connection job received output: {}", output);
return output;
} else {
throw new WorkerException("Error while getting checking connection.");
throw new WorkerException(String.format("Error checking connection, status: %s, exit code: %d", status, exitCode));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrhizor, a check operation failed in the acceptance test. I added more information to the exception message and saw this:

Error checking connection, status: Optional[io.airbyte.protocol.models.AirbyteConnectionStatus@605d010e[status=SUCCEEDED,message=<null>,additionalProperties=***]], exit code: 143

Do you know what might be the root cause behind this exit code? Could it be related to the recent change in WorkerUtils#gentleClose?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the gentle close changes look related. Are you sure this docker container isn't actually outputting an error code?

Copy link
Contributor Author

@tuliren tuliren Jan 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is what happens:

  • The database connection somehow requires an explicit close method call, otherwise, the connector will not shutdown.
  • The worker will wait for 1 minute for the connector to shutdown, otherwise it will close it by force (code here) by sending a SIGTERM.
  • The SIGTERM results in a 143 exit code.

So the root cause is that we must always close the database at the end for Databricks, which was not the case in the default check command implementation in CopyDestination. The issue has been resolved.

@tuliren tuliren force-pushed the liren/auto-download-spark-jdbc branch from 32aa7bb to 6b5d420 Compare January 7, 2022 03:56
@tuliren
Copy link
Contributor Author

tuliren commented Jan 7, 2022

/test connector=connectors/destination-databricks

🕑 connectors/destination-databricks https://github.com/airbytehq/airbyte/actions/runs/1666057934
✅ connectors/destination-databricks https://github.com/airbytehq/airbyte/actions/runs/1666057934
No Python unittests run

@tuliren tuliren temporarily deployed to more-secrets January 7, 2022 03:58 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets January 7, 2022 04:00 Inactive
@tuliren tuliren temporarily deployed to more-secrets January 7, 2022 04:03 Inactive
@tuliren tuliren changed the title Download jdbc driver for databricks connector 🐞 Destination databricks: update jdbc driver to patch log4j Jan 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Destination databricks: integration test hanging after the destination completes
4 participants