Updated dependencies for Spark 3.0.0 #30

dovijoel · 2020-07-29T07:00:13Z

Updated dependencies to work with Spark 3.0 as addressed in issue #15

…ss for Spark 3.0 changes

ghost · 2020-07-29T07:00:25Z

All CLA requirements met.

dovijoel · 2020-07-29T07:27:09Z

This is strange, all tests passed on my local dev environment. See the log below:

C:\Users\DoviJoel\.jdks\corretto-1.8.0_262\bin\java.exe -server -Xmx1536M -Dsbt.supershell=false -Didea.managed=true -Dfile.encoding=UTF-8 -Dsbt.log.noformat=true -jar C:\Users\DoviJoel\AppData\Roaming\JetBrains\IntelliJIdea2020.1\plugins\Scala\launcher\sbt-launch.jar early(addPluginSbtFile=\"\"\"C:\Users\DoviJoel\AppData\Local\Temp\idea.sbt\"\"\") "; set ideaPort in Global := 52731 ; idea-shell"
[info] welcome to sbt 1.3.13 (Amazon.com Inc. Java 1.8.0_262)
[info] loading global plugins from C:\Users\DoviJoel\.sbt\1.0\plugins
[info] loading settings for project sql-spark-connector-build from assembly.sbt,idea.sbt ...
[info] loading project definition from C:\dev\sql-spark-connector\project
[info] loading settings for project sql-spark-connector from build.sbt ...
[info] set current project to spark-mssql-connector (in build file:/C:/dev/sql-spark-connector/)
[info] Defining Global / ideaPort
[info] The new value will be used by Compile / compile, Test / compile
[info] Reapplying settings...
[info] set current project to spark-mssql-connector (in build file:/C:/dev/sql-spark-connector/)
[IJ]sbt:spark-mssql-connector> {file:/C:/dev/sql-spark-connector/}sql-spark-connector/test
[info] Compiling 1 Scala source and 1 Java source to C:\dev\sql-spark-connector\target\scala-2.12\classes ...
[warn] C:\dev\sql-spark-connector\src\main\java\com\microsoft\sqlserver\jdbc\spark\DataFrameBulkRecord.java:18:1: com.microsoft.sqlserver.jdbc.ISQLServerBulkRecord in com.microsoft.sqlserver.jdbc has been deprecated
[warn] com.microsoft.sqlserver.jdbc.ISQLServerBulkRecord
[warn] C:\dev\sql-spark-connector\src\main\java\com\microsoft\sqlserver\jdbc\spark\DataFrameBulkRecord.java:20:1: com.microsoft.sqlserver.jdbc.ISQLServerBulkRecord in com.microsoft.sqlserver.jdbc has been deprecated
[warn] ISQLServerBulkRecord
[warn] C:\dev\sql-spark-connector\src\main\java\com\microsoft\sqlserver\jdbc\spark\DataFrameBulkRecord.java:20:1: serializable class com.microsoft.sqlserver.jdbc.spark.DataFrameBulkRecord has no definition of serialVersionUID
[warn] public class DataFrameBulkRecord implements ISQLServerBulkRecord, AutoCloseable {
[warn]     private Iterator<Row> iterator;
[warn]     private ColumnMetadata[] dfColumnMetadata;
[warn]     private Set<Integer> columnOrdinals;
[warn]
[warn]     public DataFrameBulkRecord(Iterator<Row> iterator, ColumnMetadata[] dfColumnMetadata) {
[warn]         this.iterator = iterator;
[warn]         this.dfColumnMetadata = dfColumnMetadata;
[warn]         this.columnOrdinals = IntStream.range(1, dfColumnMetadata.length+1).boxed().collect(Collectors.toSet());
[warn]     }
[warn]
[warn]     @Override
[warn]     public Object[] getRowData() throws SQLServerException {
[warn]
[warn]         Row row = iterator.next();
[warn]         Object[] rowData = new Object[row.length()];
[warn]         for (int idx = 0; idx < dfColumnMetadata.length; idx++) {
[warn]              // Columns may be reordered between SQLTable and DataFrame. dfFieldIndex maps to the
[warn]              // corresponding column in rowData and thus use dfFieldIndex to get the column.
[warn]              int dfFieldIndex = dfColumnMetadata[idx].getDfColIndex();
[warn]              rowData[idx] = row.get(dfFieldIndex);
[warn]         }
[warn]         return rowData;
[warn]     }
[warn]
[warn]     @Override
[warn]     public Set<Integer> getColumnOrdinals() {
[warn]         return columnOrdinals;
[warn]     }
[warn]
[warn]     @Override
[warn]     public String getColumnName(int column) {
[warn]         return dfColumnMetadata[column-1].getName();
[warn]     }
[warn]
[warn]     @Override
[warn]     public int getColumnType(int column) {
[warn]         return dfColumnMetadata[column-1].getType();
[warn]     }
[warn]
[warn]     @Override
[warn]     public int getPrecision(int column) {
[warn]         return dfColumnMetadata[column-1].getPrecision();
[warn]     }
[warn]
[warn]     @Override
[warn]     public int getScale(int column) {
[warn]         return dfColumnMetadata[column-1].getScale();
[warn]     }
[warn]
[warn]     @Override
[warn]     public boolean isAutoIncrement(int column) {
[warn]         return dfColumnMetadata[column-1].isAutoIncrement();
[warn]     }
[warn]
[warn]     @Override
[warn]     public boolean next() throws SQLServerException {
[warn]         return iterator.hasNext();
[warn]     }
[warn]
[warn]     @Override
[warn]     public void close() throws SQLServerException {}
[warn]
[warn]     @Override
[warn]     public void addColumnMetadata(int positionInFile, String name, int jdbcType,
[warn]         int precision, int scale) {}
[warn]
[warn]     @Override
[warn]     public void addColumnMetadata(int positionInFile, String name, int jdbcType,
[warn]         int precision, int scale, DateTimeFormatter dateTimeFormatter) {}
[warn]
[warn]     @Override
[warn]     public DateTimeFormatter getColumnDateTimeFormatter(int column) {
[warn]         return null;
[warn]     }
[warn]
[warn]     @Override
[warn]     public void setTimestampWithTimezoneFormat(String dateTimeFormat) {}
[warn]
[warn]     @Override
[warn]     public void setTimestampWithTimezoneFormat(DateTimeFormatter dateTimeFormatter) {}
[warn]
[warn]     @Override
[warn]     public void setTimeWithTimezoneFormat(String timeFormat) {}
[warn]
[warn]     @Override
[warn]     public void  setTimeWithTimezoneFormat(DateTimeFormatter dateTimeFormatter) {}
[warn] }
[info] Done compiling.
[info] Compiling 1 Java source to C:\dev\sql-spark-connector\target\scala-2.12\test-classes ...
[info] Done compiling.
[debug] Test run started
[debug] Test com.microsoft.sqlserver.jdbc.spark.DataSourceUtilsTest.dataFrameBulkRecordTest started
[debug] Test com.microsoft.sqlserver.jdbc.spark.DataSourceUtilsTest.dataFrameBulkRecordTest finished, took 0.037 sec
[debug] Test com.microsoft.sqlserver.jdbc.spark.DataSourceUtilsTest.columnMetadataTest started
[debug] Test com.microsoft.sqlserver.jdbc.spark.DataSourceUtilsTest.columnMetadataTest finished, took 0.001 sec
[debug] Test run finished: 0 failed, 0 ignored, 2 total, 0.046s
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/07/29 00:23:21 INFO SparkContext: Running Spark version 3.0.0
20/07/29 00:23:21 INFO ResourceUtils: ==============================================================
20/07/29 00:23:21 INFO ResourceUtils: Resources for spark.driver:

20/07/29 00:23:21 INFO ResourceUtils: ==============================================================
20/07/29 00:23:21 INFO SparkContext: Submitted application: test-sql-context
20/07/29 00:23:22 INFO SecurityManager: Changing view acls to: DoviJoel
20/07/29 00:23:22 INFO SecurityManager: Changing modify acls to: DoviJoel
20/07/29 00:23:22 INFO SecurityManager: Changing view acls groups to:
20/07/29 00:23:22 INFO SecurityManager: Changing modify acls groups to:
20/07/29 00:23:22 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(DoviJoel); groups with view permissions: Set(); users  with modify permissions: Set(DoviJoel); groups with modify permissions: Set()
20/07/29 00:23:22 INFO Utils: Successfully started service 'sparkDriver' on port 52763.
20/07/29 00:23:22 INFO SparkEnv: Registering MapOutputTracker
20/07/29 00:23:22 INFO SparkEnv: Registering BlockManagerMaster
20/07/29 00:23:22 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/07/29 00:23:22 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/07/29 00:23:22 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
20/07/29 00:23:22 INFO DiskBlockManager: Created local directory at C:\Users\DoviJoel\AppData\Local\Temp\blockmgr-c54378f1-32b0-4f4e-8e1e-ee8305440657
20/07/29 00:23:22 INFO MemoryStore: MemoryStore started with capacity 671.1 MiB
20/07/29 00:23:22 INFO SparkEnv: Registering OutputCommitCoordinator
20/07/29 00:23:23 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
20/07/29 00:23:23 INFO Utils: Successfully started service 'SparkUI' on port 4041.
20/07/29 00:23:23 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://DESKTOP-5BLC2ED:4041
20/07/29 00:23:23 INFO Executor: Starting executor ID driver on host DESKTOP-5BLC2ED
20/07/29 00:23:23 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 52790.
20/07/29 00:23:23 INFO NettyBlockTransferService: Server created on DESKTOP-5BLC2ED:52790
20/07/29 00:23:23 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/07/29 00:23:23 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, DESKTOP-5BLC2ED, 52790, None)
20/07/29 00:23:23 INFO BlockManagerMasterEndpoint: Registering block manager DESKTOP-5BLC2ED:52790 with 671.1 MiB RAM, BlockManagerId(driver, DESKTOP-5BLC2ED, 52790, None)
20/07/29 00:23:23 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, DESKTOP-5BLC2ED, 52790, None)
20/07/29 00:23:23 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, DESKTOP-5BLC2ED, 52790, None)
20/07/29 00:23:23 INFO DataSourceTest:

===== TEST OUTPUT FOR com.microsoft.sqlserver.jdbc.spark.DataSourceTest: 'Schema validation between Spark DataFrame and SQL Server ResultSet' =====

20/07/29 00:23:23 INFO DataSourceTest:

===== FINISHED com.microsoft.sqlserver.jdbc.spark.DataSourceTest: 'Schema validation between Spark DataFrame and SQL Server ResultSet' =====




20/07/29 00:23:23 INFO SharedState: loading hive config file: jar:file:/C:/Users/DoviJoel/.ivy2/cache/org.apache.spark/spark-sql_2.12/jars/spark-sql_2.12-3.0.0-tests.jar!/hive-site.xml
20/07/29 00:23:24 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/C:/dev/sql-spark-connector/spark-warehouse//com.microsoft.sqlserver.jdbc.spark.DataSourceTest').
20/07/29 00:23:24 INFO SharedState: Warehouse path is 'file:/C:/dev/sql-spark-connector/spark-warehouse//com.microsoft.sqlserver.jdbc.spark.DataSourceTest'.
20/07/29 00:23:24 INFO DataSourceTest:

===== TEST OUTPUT FOR com.microsoft.sqlserver.jdbc.spark.DataSourceTest: 'JdbcBulkOptions should have proper Bulk configurations' =====

20/07/29 00:23:24 INFO DataSourceTest:

===== FINISHED com.microsoft.sqlserver.jdbc.spark.DataSourceTest: 'JdbcBulkOptions should have proper Bulk configurations' =====

20/07/29 00:23:24 INFO DataSourceTest:

===== TEST OUTPUT FOR com.microsoft.sqlserver.jdbc.spark.DataSourceTest: 'Data pool URL generation' =====




20/07/29 00:23:24 INFO DataSourceTest:

===== FINISHED com.microsoft.sqlserver.jdbc.spark.DataSourceTest: 'Data pool URL generation' =====

20/07/29 00:23:24 INFO DataSourceTest:

===== TEST OUTPUT FOR com.microsoft.sqlserver.jdbc.spark.DataSourceTest: 'Multi part tablename test' =====

20/07/29 00:23:24 INFO DataSourceTest:

===== FINISHED com.microsoft.sqlserver.jdbc.spark.DataSourceTest: 'Multi part tablename test' =====

20/07/29 00:23:24 INFO DataSourceTest:

===== TEST OUTPUT FOR com.microsoft.sqlserver.jdbc.spark.DataSourceTest: 'Data pool options test' =====

20/07/29 00:23:24 INFO DataSourceTest:

===== FINISHED com.microsoft.sqlserver.jdbc.spark.DataSourceTest: 'Data pool options test' =====

20/07/29 00:23:24 INFO DataSourceTest:

===== TEST OUTPUT FOR com.microsoft.sqlserver.jdbc.spark.DataSourceTest: 'Default AAD options are correct.' =====

20/07/29 00:23:24 INFO DataSourceTest:

===== FINISHED com.microsoft.sqlserver.jdbc.spark.DataSourceTest: 'Default AAD options are correct.' =====

20/07/29 00:23:24 INFO DataSourceTest:

===== TEST OUTPUT FOR com.microsoft.sqlserver.jdbc.spark.DataSourceTest: 'Correct AAD options are set when accessToken is specified' =====

20/07/29 00:23:24 INFO DataSourceTest:

===== FINISHED com.microsoft.sqlserver.jdbc.spark.DataSourceTest: 'Correct AAD options are set when accessToken is specified' =====

20/07/29 00:23:25 INFO SparkUI: Stopped Spark web UI at http://DESKTOP-5BLC2ED:4041
20/07/29 00:23:25 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/07/29 00:23:25 INFO MemoryStore: MemoryStore cleared
20/07/29 00:23:25 INFO BlockManager: BlockManager stopped
20/07/29 00:23:25 INFO BlockManagerMaster: BlockManagerMaster stopped
20/07/29 00:23:25 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
[info] DataSourceTest:
[info] - Schema validation between Spark DataFrame and SQL Server ResultSet
[info] - JdbcBulkOptions should have proper Bulk configurations
[info] - Data pool URL generation
[info] - Multi part tablename test
[info] - Data pool options test
[info] - Default AAD options are correct.
[info] - Correct AAD options are set when accessToken is specified
[info] ScalaTest20/07/29 00:23:25 INFO SparkContext: Successfully stopped SparkContext
20/07/29 00:23:25 WARN DataSourceTest:

===== POSSIBLE THREAD LEAK IN SUITE com.microsoft.sqlserver.jdbc.spark.DataSourceTest, thread names: rpc-boss-3-1, shuffle-boss-6-1 =====


[info] Run completed in 4 seconds, 649 milliseconds.
[info] Total number of tests run: 7
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 7, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 9, Failed 0, Errors 0, Passed 9
[success] Total time: 18 s, completed 29 Jul 2020 9:23:25 AM

`

shivsood · 2020-07-29T17:35:04Z

@dovijoel Thanks for your work. Did you do a E2E test with Databricks/Spark3.0? AFIAK that will fail. This needs a source level fix as well.

shivsood · 2020-07-29T17:37:21Z

build.sbt

-val sparkVersion = "2.4.6"
+scalaVersion := "2.12.11"
+ThisBuild / useCoursier := false
+val sparkVersion = "3.0.0"


Would need to support sparkVersion 2.4/Scala 2.11 combo as well.

Ok, I will look into supporting both scenarios.

I think you can use something like
crossScalaVersions := Seq("2.12.10", "2.11.12")
to do this

@rajmera3 it won't work, because there is no Spark 3.0 with Scala 2.11. Here we need to have a combo of (Spark 2.4 + Scala 2.11) & (Spark 3.0 + Scala 2.12)

One option here is to make main line as Spark3.0/Scala.2.12 and stable/old version to a separate branch e.g. Spark2.4 branch.

dovijoel · 2020-07-30T14:21:27Z

@dovijoel Thanks for your work. Did you do a E2E test with Databricks/Spark3.0? AFIAK that will fail. This needs a source level fix as well.

Can you please point me in the right direction on how to do an E2E test?

After looking through the changes that was done for Spark 3.0, it seems that some classes are now no longer accessible, specifically the internal Logger class
Someone in the below JIRA suggests pasting the code into the project in order to use it. Do you think that is a viable solution?
https://issues.apache.org/jira/browse/SPARK-13928
https://github.com/apache/spark/pull/11764

rajmera3 · 2020-08-05T19:49:09Z

@shivsood please advise @dovijoel when you get the chance

shivsood · 2020-08-10T23:43:20Z

@dovijoel Thanks for your work. Did you do a E2E test with Databricks/Spark3.0? AFIAK that will fail. This needs a source level fix as well.

Can you please point me in the right direction on how to do an E2E test?

After looking through the changes that was done for Spark 3.0, it seems that some classes are now no longer accessible, specifically the internal Logger class
Someone in the below JIRA suggests pasting the code into the project in order to use it. Do you think that is a viable solution?
https://issues.apache.org/jira/browse/SPARK-13928
https://github.com/apache/spark/pull/11764

I get a 404 when i access these links.

apache/spark#11764

Yes, will need to remove the dependency on internal logging and have a logger created in the connector. 2 possibilities

get the logger using Logger.getLogger()
Implement our own logging class like the spark trait Logging

I am not sure what's other issues happen with DBR 3.0. Can u please put any errors/finding here?

dovijoel · 2020-08-11T06:52:31Z

@dovijoel Thanks for your work. Did you do a E2E test with Databricks/Spark3.0? AFIAK that will fail. This needs a source level fix as well.

Can you please point me in the right direction on how to do an E2E test?
After looking through the changes that was done for Spark 3.0, it seems that some classes are now no longer accessible, specifically the internal Logger class
Someone in the below JIRA suggests pasting the code into the project in order to use it. Do you think that is a viable solution?
https://issues.apache.org/jira/browse/SPARK-13928
https://github.com/apache/spark/pull/11764

I get a 404 when i access these links.

apache/spark#11764

Yes, will need to remove the dependency on internal logging and have a logger created in the connector. 2 possibilities

get the logger using Logger.getLogger()

Implement our own logging class like the spark trait Logging

I am not sure what's other issues happen with DBR 3.0. Can u please put any errors/finding here?

Thank you for your advice! I will incorporate your advice and let you know.

nightscape · 2020-08-11T08:47:36Z

Hi @dovijoel, did you see my PR against your PR? 😄
dovijoel#1
dovijoel#2

aravish · 2020-08-13T17:31:54Z

I was able to validate the connector with Databricks DBR 7.1 runtime successfully from the forked branch which had the PR 30 merged. (https://github.com/dovijoel/sql-spark-connector).

print("Use Apache Spark Connector for SQL Server and Azure SQL to write to master SQL instance ")
servername = "jdbc:sqlserver://aravishsqlserver.database.windows.net"
dbname = "custommetastore"
url = servername + ";" + "databaseName=" + dbname + ";"

dbtable = "TBLS"
user = "aravish"
password = "xxxxxxxx" # Please specify password here

print("read data from SQL server table  ")
jdbcDF = spark.read \
        .format("com.microsoft.sqlserver.jdbc.spark") \
        .option("url", url) \
        .option("dbtable", dbtable) \
        .option("user", user) \
        .option("password", password).load()

jdbcDF.show(5)

Writes

jdbcDF.write \
    .format("com.microsoft.sqlserver.jdbc.spark") \
    .mode("overwrite") \
    .option("url", url) \
    .option("dbtable", "TBLS_Spark_SQL_Connector") \
    .option("user", user) \
    .option("password", password) \
    .save()

(1) Spark Jobs
Command took 2.73 seconds -- by arravish@microsoft.com at 8/13/2020, 9:05:57 AM on dbr70

shivsood · 2020-08-13T17:41:57Z

@dovijoel Thanks for your work. Did you do a E2E test with Databricks/Spark3.0? AFIAK that will fail. This needs a source level fix as well.

Can you please point me in the right direction on how to do an E2E test?
After looking through the changes that was done for Spark 3.0, it seems that some classes are now no longer accessible, specifically the internal Logger class
Someone in the below JIRA suggests pasting the code into the project in order to use it. Do you think that is a viable solution?
https://issues.apache.org/jira/browse/SPARK-13928
https://github.com/apache/spark/pull/11764

I get a 404 when i access these links.

apache/spark#11764

Yes, will need to remove the dependency on internal logging and have a logger created in the connector. 2 possibilities

get the logger using Logger.getLogger()

Implement our own logging class like the spark trait Logging

I am not sure what's other issues happen with DBR 3.0. Can u please put any errors/finding here?

Thank you for your advice! I will incorporate your advice and let you know.
@dovijoel here's the error that @rajmera3 reported when he tested with master sometime back

Runtime 7.0 (beta) (Apache Spark 3.0.0, Scala 2.12)
a. Error: java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging$class

shivsood · 2020-08-13T17:43:24Z

I was able to validate the connector with Databricks DBR 7.1 runtime successfully from the forked branch which had the PR 30 merged. (https://github.com/dovijoel/sql-spark-connector).
print("Use Apache Spark Connector for SQL Server and Azure SQL to write to master SQL instance ")
servername = "jdbc:sqlserver://aravishsqlserver.database.windows.net"
dbname = "custommetastore"
url = servername + ";" + "databaseName=" + dbname + ";"

dbtable = "TBLS"
user = "aravish"
password = "xxxxxxxx" # Please specify password here

print("read data from SQL server table  ")
jdbcDF = spark.read \
        .format("com.microsoft.sqlserver.jdbc.spark") \
        .option("url", url) \
        .option("dbtable", dbtable) \
        .option("user", user) \
        .option("password", password).load()

jdbcDF.show(5)
(1) Spark Jobs
Job 0 View(Stages: 1/1)
jdbcDF:pyspark.sql.dataframe.DataFrame = [TBL_ID: long, CREATE_TIME: integer ... 9 more fields]
Use Apache Spark Connector for SQL Server and Azure SQL to write to master SQL instance
read data from SQL server table
+------+-----------+-----+----------------+-----+---------+-----+---------------+-------------+------------------+------------------+
|TBL_ID|CREATE_TIME|DB_ID|LAST_ACCESS_TIME|OWNER|RETENTION|SD_ID| TBL_NAME| TBL_TYPE|VIEW_EXPANDED_TEXT|VIEW_ORIGINAL_TEXT|
+------+-----------+-----+----------------+-----+---------+-----+---------------+-------------+------------------+------------------+
| 1| 1550187698| 1| 0| root| 0| 1|hivesampletable|MANAGED_TABLE| null| null|
+------+-----------+-----+----------------+-----+---------+-----+---------------+-------------+------------------+------------------+
Command took 4.55 seconds -- by arravish@microsoft.com at 8/13/2020, 4:48:17 AM on dbr70

Writes
jdbcDF.write \
    .format("com.microsoft.sqlserver.jdbc.spark") \
    .mode("overwrite") \
    .option("url", url) \
    .option("dbtable", "TBLS_Spark_SQL_Connector") \
    .option("user", user) \
    .option("password", password) \
    .save()
(1) Spark Jobs
Command took 2.73 seconds -- by arravish@microsoft.com at 8/13/2020, 9:05:57 AM on dbr70

Thanks @aravish. Can u test with Master and check that you get an error and repeat the test with this PR branch.

MrWhiteABEX · 2020-09-18T10:30:55Z

Any progress on this? I can also validate that the spark-3.0 branch of @dovijoel works. I have compiled it successfully on Windows and used it to read and write data using DBR 7.3.
@aravish getting Spark 3.0 support is really crucial as a Azure Databricks customer. We are currently on 6.5 due to needed MLFlow features. So no option to go back to 6.4. The support ends in October. I can still update to 6.6 but that is also only a temporary solution until November. I would really like to avoid using a custom build sql-spark-connector.

gmdiana-hershey · 2020-10-05T23:57:30Z

@MrWhiteABEX - we have the same issue, using an old Databricks runtime because this connector doesn't support Spark 3.0. As an Azure Databricks customer, it's important to us that Spark 3.0+ support comes ASAP. Please advise if there's anything I can do to validate this in our DEV environment.

MrWhiteABEX · 2020-10-06T07:38:00Z

@gmdiana-hershey In our testing environment I´am already running DBR 7.2 with the spark connector from adjustments from dovijoel. No issues experienced so far. I think the only problem is that the CI build pipeline is broken by Scala 2.12 but I did not investigate it.
I just compiled it locally using sbt assembly and uploaded the library to my Databricks workspace. Would like to see official Spark 3.0 support but I guess I will just switch to my custom library and update to DBR 7. See already large improvements in my streaming pipeline due to Spark 3.0

alexott · 2020-11-06T18:17:13Z

I just tested build with Spark 3.0.1 & DBR 7.3, and it works just fine.

pmooij · 2020-11-26T13:38:58Z

I was able to validate the connector with Databricks DBR 7.1 runtime successfully from the forked branch which had the PR 30 merged. (https://github.com/dovijoel/sql-spark-connector).
print("Use Apache Spark Connector for SQL Server and Azure SQL to write to master SQL instance ")
servername = "jdbc:sqlserver://aravishsqlserver.database.windows.net"
dbname = "custommetastore"
url = servername + ";" + "databaseName=" + dbname + ";"

dbtable = "TBLS"
user = "aravish"
password = "xxxxxxxx" # Please specify password here

print("read data from SQL server table  ")
jdbcDF = spark.read \
        .format("com.microsoft.sqlserver.jdbc.spark") \
        .option("url", url) \
        .option("dbtable", dbtable) \
        .option("user", user) \
        .option("password", password).load()

jdbcDF.show(5)
(1) Spark Jobs
Job 0 View(Stages: 1/1)
jdbcDF:pyspark.sql.dataframe.DataFrame = [TBL_ID: long, CREATE_TIME: integer ... 9 more fields]
Use Apache Spark Connector for SQL Server and Azure SQL to write to master SQL instance
read data from SQL server table
+------+-----------+-----+----------------+-----+---------+-----+---------------+-------------+------------------+------------------+
|TBL_ID|CREATE_TIME|DB_ID|LAST_ACCESS_TIME|OWNER|RETENTION|SD_ID| TBL_NAME| TBL_TYPE|VIEW_EXPANDED_TEXT|VIEW_ORIGINAL_TEXT|
+------+-----------+-----+----------------+-----+---------+-----+---------------+-------------+------------------+------------------+
| 1| 1550187698| 1| 0| root| 0| 1|hivesampletable|MANAGED_TABLE| null| null|
+------+-----------+-----+----------------+-----+---------+-----+---------------+-------------+------------------+------------------+
Command took 4.55 seconds -- by arravish@microsoft.com at 8/13/2020, 4:48:17 AM on dbr70
Writes
jdbcDF.write \
    .format("com.microsoft.sqlserver.jdbc.spark") \
    .mode("overwrite") \
    .option("url", url) \
    .option("dbtable", "TBLS_Spark_SQL_Connector") \
    .option("user", user) \
    .option("password", password) \
    .save()
(1) Spark Jobs
Command took 2.73 seconds -- by arravish@microsoft.com at 8/13/2020, 9:05:57 AM on dbr70
Thanks @aravish. Can u test with Master and check that you get an error and repeat the test with this PR branch.

@aravish @shivsood any update on this?
needed to step back from DBR 6.6 to 6.4 today because 6.6 is EOL now :(

lmicverm · 2020-12-03T08:01:56Z

I tried building it with sbt assembly to get a fat jar. I used the jar on DBR 7.3 with spark 3.0.1, but when writing to a database, I get the following error:

Py4JJavaError: An error occurred while calling o491.save.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 0.0 failed 4 times, most recent failure: Lost task 5.3 in stage 0.0 (TID 16, 10.139.64.5, executor 0): com.microsoft.sqlserver.jdbc.SQLServerException: The connection is closed.
	at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:234)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.checkClosed(SQLServerConnection.java:1130)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.rollback(SQLServerConnection.java:3270)
	at com.microsoft.sqlserver.jdbc.spark.BulkCopyUtils$.savePartition(BulkCopyUtils.scala:53)
	at com.microsoft.sqlserver.jdbc.spark.SingleInstanceWriteStrategies$.$anonfun$write$2(BestEffortSingleInstanceStrategy.scala:30)
	at com.microsoft.sqlserver.jdbc.spark.SingleInstanceWriteStrategies$.$anonfun$write$2$adapted(BestEffortSingleInstanceStrategy.scala:29)
	at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1001)
	at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2$adapted(RDD.scala:1001)
	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2371)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:144)
	at org.apache.spark.scheduler.Task.run(Task.scala:117)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:662)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1581)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:665)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

It appears that it fails when writing the data to the database, since the table itself is successfully created in the database with all its columns.

pmooij · 2020-12-09T16:20:17Z

compiling the fat JAR worked for me, as I descbribed here: #15 (comment)

this has been running smooth at Databrick Runtime 7.4 | Spark 3.1 over the last few days

shivsood · 2021-01-11T19:29:18Z

build.sbt

  "com.novocode" % "junit-interface" % "0.11" % "test",

  //SQLServer JDBC jars
-  "com.microsoft.sqlserver" % "mssql-jdbc" % "7.2.1.jre8"
+  "com.microsoft.sqlserver" % "mssql-jdbc" % "8.2.1.jre8"


why v8.2 for JDBC driver? Did 7.2 give some issue or was this just alignment to latest.

Spark 3 is JDK 11, you need to use

// https://mvnrepository.com/artifact/com.microsoft.sqlserver/mssql-jdbc libraryDependencies += "com.microsoft.sqlserver" % "mssql-jdbc" % "8.4.1.jre11"

shivsood · 2021-01-12T01:24:59Z

@dovijoel Can u please rebase with master and push? Want to standardize on this branch Spark 3.0/Scala 2.12 support. Meanwhile ,have create a new PR based off this PR for Spark3.0/Scala2.12 support. #76

dovijoel · 2021-01-12T09:07:42Z

Sure, I will do so tonight. Get Outlook for Android<https://aka.ms/ghei36>

…

________________________________ From: shivsood <notifications@github.com> Sent: Tuesday, January 12, 2021 3:25:13 AM To: microsoft/sql-spark-connector <sql-spark-connector@noreply.github.com> Cc: Dovi Joel <dovi@joel.family>; Mention <mention@noreply.github.com> Subject: Re: [microsoft/sql-spark-connector] Updated dependencies for Spark 3.0.0 (#30) @dovijoel<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.luolix.top%2Fdovijoel&data=04%7C01%7C%7Cd1351a9693be45579d2b08d8b698e99d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637460115153978119%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZBn4SYG%2BEi7dgo56QKOFz2mMZVLkkIGqYjVrtsNxUCU%3D&reserved=0> Can u please rebase with master and push? Want to standardize on this branch Spark 3.0/Scala 2.12 support. Meanwhile ,have create a new PR based off this PR for Spark3.0/Scala2.12 support. #76<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.luolix.top%2Fmicrosoft%2Fsql-spark-connector%2Fpull%2F76&data=04%7C01%7C%7Cd1351a9693be45579d2b08d8b698e99d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637460115153988112%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=kiElWrLtMHqFVReg7tRNd018lZVzQIFe%2BmzOiIl65As%3D&reserved=0> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.luolix.top%2Fmicrosoft%2Fsql-spark-connector%2Fpull%2F30%23issuecomment-758331307&data=04%7C01%7C%7Cd1351a9693be45579d2b08d8b698e99d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637460115153988112%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=SzHPCKNKYCCn483r37BcC7JSe8efxVz8rdEswC6wKKs%3D&reserved=0>, or unsubscribe<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.luolix.top%2Fnotifications%2Funsubscribe-auth%2FABPZFJUTMWRDDZZTAZ2XQGTSZOQHTANCNFSM4PLIY7UQ&data=04%7C01%7C%7Cd1351a9693be45579d2b08d8b698e99d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637460115153998109%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=AdFiC5Z6dFCnpGb5TjDywNe89w9SQpidHms2TOVBN2g%3D&reserved=0>.

Updated dependencies for Spark 3.0.0 and fixed the DataSourceTest cla…

d0b9300

…ss for Spark 3.0 changes

Updated JAR version to reflect the version in README.md

4058060

shivsood reviewed Jul 29, 2020

View reviewed changes

Merge branch 'master' into master

68bc331

dovijoel marked this pull request as draft July 30, 2020 14:24

Create scala2.yml

cf6e4e9

shivsood mentioned this pull request Aug 11, 2020

Spark 3.0 Support #15

Closed

shivsood reviewed Jan 11, 2021

View reviewed changes

shivsood mentioned this pull request Feb 2, 2021

PR for spark3.0/Scala2.12 #76

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated dependencies for Spark 3.0.0 #30

Updated dependencies for Spark 3.0.0 #30

dovijoel commented Jul 29, 2020

ghost commented Jul 29, 2020 •

edited by ghost

Loading

dovijoel commented Jul 29, 2020

shivsood commented Jul 29, 2020

shivsood Jul 29, 2020

dovijoel Jul 30, 2020

rajmera3 Dec 9, 2020

alexott Dec 10, 2020

shivsood Jan 11, 2021

dovijoel commented Jul 30, 2020

rajmera3 commented Aug 5, 2020

shivsood commented Aug 10, 2020

dovijoel commented Aug 11, 2020

nightscape commented Aug 11, 2020

aravish commented Aug 13, 2020 •

edited

Loading

shivsood commented Aug 13, 2020

shivsood commented Aug 13, 2020

MrWhiteABEX commented Sep 18, 2020

gmdiana-hershey commented Oct 5, 2020

MrWhiteABEX commented Oct 6, 2020

alexott commented Nov 6, 2020

pmooij commented Nov 26, 2020

lmicverm commented Dec 3, 2020

pmooij commented Dec 9, 2020

shivsood Jan 11, 2021

sandeep540 Jan 12, 2021

shivsood commented Jan 12, 2021

dovijoel commented Jan 12, 2021 via email

Updated dependencies for Spark 3.0.0 #30

Are you sure you want to change the base?

Updated dependencies for Spark 3.0.0 #30

Conversation

dovijoel commented Jul 29, 2020

ghost commented Jul 29, 2020 • edited by ghost Loading

dovijoel commented Jul 29, 2020

shivsood commented Jul 29, 2020

shivsood Jul 29, 2020

Choose a reason for hiding this comment

dovijoel Jul 30, 2020

Choose a reason for hiding this comment

rajmera3 Dec 9, 2020

Choose a reason for hiding this comment

alexott Dec 10, 2020

Choose a reason for hiding this comment

shivsood Jan 11, 2021

Choose a reason for hiding this comment

dovijoel commented Jul 30, 2020

rajmera3 commented Aug 5, 2020

shivsood commented Aug 10, 2020

dovijoel commented Aug 11, 2020

nightscape commented Aug 11, 2020

aravish commented Aug 13, 2020 • edited Loading

shivsood commented Aug 13, 2020

shivsood commented Aug 13, 2020

MrWhiteABEX commented Sep 18, 2020

gmdiana-hershey commented Oct 5, 2020

MrWhiteABEX commented Oct 6, 2020

alexott commented Nov 6, 2020

pmooij commented Nov 26, 2020

lmicverm commented Dec 3, 2020

pmooij commented Dec 9, 2020

shivsood Jan 11, 2021

Choose a reason for hiding this comment

sandeep540 Jan 12, 2021

Choose a reason for hiding this comment

shivsood commented Jan 12, 2021

dovijoel commented Jan 12, 2021 via email

ghost commented Jul 29, 2020 •

edited by ghost

Loading

aravish commented Aug 13, 2020 •

edited

Loading