databricks-playground/dbconnect-sbt at main · alexott/databricks-playground

Name	Name	Last commit message	Last commit date
parent directory ..
project	project
src/main/scala/net/alexott/demos/databricks	src/main/scala/net/alexott/demos/databricks
README.md	README.md
build.sbt	build.sbt

This is an example of how SBT could be used to build project that uses jars from the databricks-connect. Sometimes this is required because Databricks runtime (DBR) sometimes has more functionality than open source spark. But in the most case it's enough to build a project using OSS Spark dependencies, and declare them as provided (shown as example in the build.sbt, as commented out code).

Setup

To build the project following setup steps are necessary:

Install Databricks Connect & Databricks CLI (better into a separate virtual Python/Conda environment - in this example, with name test. For DBR 6.x we must to use Python 3.7!):

conda create --name test python=3.8
conda activate test
pip install -U databricks-connect==7.3 databricks-cli
export DBCONNECT_JARS=$(databricks-connect get-jar-dir)
export SPARK_HOME=$(dirname $DBCONNECT_JARS)

Configure Databricks Connect as described in documentation - it's required if you want to run the artifact from your machine
(optional) Configure Databricks CLI as described in documentation

Build

To be able to compile & package this project you need to do following:

(optional) Setting the location of the DB jars. It could be configured by multiple ways, in order of execution:
1. SBT will try to use the path from the environment variable DBCONNECT_JARS set above.
2. try to get path from the DbConnectJars system property - that's useful when you're running the project from the IDE. It could be set in the .sbtopts file, for example, as -DDbConnectJars=.... (execute in the root directory of the project following shell command: echo "-DDbConnectJars=$(databricks-connect get-jar-dir)" |tee -a .sbtopts to add this definition to the .sbtopts file)
3. try to execute databricks-connect get-jar-dir if the databricks-connect is in the PATH
4. take path from the jarsPathManual that is defined in the build.sbt - open it in the editor, and set this variable to the path to jars obtained via databricks-connect get-jar-dir
Execute sbt clean package to build the project

Executing from command-line

To run the built project from the command line we need to build the project with sbt package & execute following command:

$SPARK_HOME/bin/spark-submit --class net.alexott.demos.databricks.SimpleSQL target/scala-2.12/dbconnect-sbt_2.12-0.1.jar

In some cases you need to set environment variable DEBUG_IGNORE_VERSION_MISMATCH to value 1 because spark-submit checks if the databricks-connect checks for its version & fails if it doesn't match to the cluster's version. And it could be a problem as databricks-connect isn't published for every DBR version. You just need to make sure that you're using databricks-connect 6.x with DBR 6.x, and similarly for 7.x versions.

Executing from Intellij Idea

The Ultimate version of Intellij Idea has built-in support for submitting Spark jobs via spark-submit, so refer to documentation on how to configure it.

For Community Edition of Idea, we'll need to force the explicit submission of the Spark job by using the SparkSubmit class. To configure this, go to the "Run" > "Edit Configurations..." and change settings as following:

Main class: org.apache.spark.deploy.SparkSubmit
VM options: -cp $Classpath$:$SPARK_HOME
Program arguments: --class net.alexott.demos.databricks.SimpleSQL target/scala-2.12/dbconnect-sbt_2.12-0.1.jar - the .jar file should be built before execution, so it makes sense to hook sbt package into the "Before launch" configuration
Environment variables: SPARK_HOME=... (put the value of SPARK_HOME defined above), and maybe DEBUG_IGNORE_VERSION_MISMATCH=1 to allow to run on "incompatible" clusters.

After that you can execute Spark job directly from Idea.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dbconnect-sbt

dbconnect-sbt

README.md

Setup

Build

Executing from command-line

Executing from Intellij Idea

Files

dbconnect-sbt

Directory actions

More options

Directory actions

More options

Latest commit

History

dbconnect-sbt

Folders and files

parent directory

README.md

Setup

Build

Executing from command-line

Executing from Intellij Idea