Implement databricks connector as a copy destination #5748

tuliren · 2021-08-31T05:58:10Z

Change the implementation from JdbcDestination to CopyDestination.
Follow up PR for Add skeleton for databricks destination #5629.

tuliren · 2021-08-31T05:58:49Z

airbyte-integrations/connectors/destination-databricks/build.gradle

@@ -15,6 +15,14 @@ dependencies {
    implementation project(':airbyte-integrations:bases:base-java')
    implementation files(project(':airbyte-integrations:bases:base-java').airbyteDocker.outputs)
    implementation project(':airbyte-integrations:connectors:destination-jdbc')
+    implementation project(':airbyte-integrations:connectors:destination-s3')


Let's depend on the s3 destination for now and DRY it later.

tuliren · 2021-08-31T06:00:43Z

...icks/src/main/java/io/airbyte/integrations/destination/databricks/DatabricksDestination.java


  @Override
-  public AirbyteConnectionStatus check(JsonNode config) {


The CopyDestination can check or create the SQL tables by calling the abstract method. So we only need to check the persistence in this implementation.

tuliren · 2021-08-31T06:02:00Z

...cks/src/main/java/io/airbyte/integrations/destination/databricks/DatabricksStreamCopier.java

+ * This implementation is similar to {@link io.airbyte.integrations.destination.jdbc.copy.s3.S3StreamCopier}.
+ * The difference is that this implementation creates Parquet staging files, instead of CSV ones.
+ */
+public class DatabricksStreamCopier implements StreamCopier {


We can focus on implementing the methods in this class to transfer the data from S3 to Databricks.

tuliren · 2021-08-31T06:08:01Z

...stination-jdbc/src/main/java/io/airbyte/integrations/destination/jdbc/copy/StreamCopier.java

@@ -36,7 +37,7 @@
  /**
   * Writes a value to a staging file for the stream.
   */
-  void write(UUID id, String jsonDataString, Timestamp emittedAt) throws Exception;
+  void write(UUID id, AirbyteRecordMessage recordMessage) throws Exception;


This interface change is necessary to drop in the S3ParquetWriter directly for convenience.

tuliren · 2021-08-31T06:08:26Z

...on-jdbc/src/main/java/io/airbyte/integrations/destination/jdbc/copy/StreamCopierFactory.java

 import io.airbyte.protocol.models.DestinationSyncMode;

 public interface StreamCopierFactory<T> {

  StreamCopier create(String configuredSchema,
                      T config,
                      String stagingFolder,
-                      DestinationSyncMode syncMode,
-                      AirbyteStream stream,
+                      ConfiguredAirbyteStream configuredStream,


This interface change is necessary to reuse the S3Writer.

Phlair · 2021-08-31T12:39:27Z

@tuliren I'll merge this in and then work off it

Implement databricks destination as a stream copier

5dcd8ed

github-actions bot added the area/connectors Connector related issues label Aug 31, 2021

tuliren commented Aug 31, 2021

View reviewed changes

tuliren changed the title ~~Implement databricks destination as a stream copier~~ Implement databricks destination as a copy destination Aug 31, 2021

tuliren changed the title ~~Implement databricks destination as a copy destination~~ Implement databricks connector as a copy destination Aug 31, 2021

tuliren commented Aug 31, 2021

View reviewed changes

tuliren requested a review from Phlair August 31, 2021 06:02

tuliren commented Aug 31, 2021

View reviewed changes

tuliren linked an issue Aug 31, 2021 that may be closed by this pull request

New destination: Databricks Delta Lake #2075

Closed

Phlair approved these changes Aug 31, 2021

View reviewed changes

Phlair merged commit d7db844 into george/hotload-jar Aug 31, 2021

Phlair deleted the liren/databricks-s3-writer branch August 31, 2021 12:39

karinakuz added connectors/destinations-warehouse connectors/destination/databricks labels Jan 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement databricks connector as a copy destination #5748

Implement databricks connector as a copy destination #5748

tuliren commented Aug 31, 2021 •

edited

Loading

tuliren Aug 31, 2021

tuliren Aug 31, 2021

tuliren Aug 31, 2021

tuliren Aug 31, 2021

tuliren Aug 31, 2021

Phlair commented Aug 31, 2021


		@Override
		public AirbyteConnectionStatus check(JsonNode config) {

Implement databricks connector as a copy destination #5748

Implement databricks connector as a copy destination #5748

Conversation

tuliren commented Aug 31, 2021 • edited Loading

tuliren Aug 31, 2021

Choose a reason for hiding this comment

tuliren Aug 31, 2021

Choose a reason for hiding this comment

tuliren Aug 31, 2021

Choose a reason for hiding this comment

tuliren Aug 31, 2021

Choose a reason for hiding this comment

tuliren Aug 31, 2021

Choose a reason for hiding this comment

Phlair commented Aug 31, 2021

tuliren commented Aug 31, 2021 •

edited

Loading