This project has the idea of testing the Databricks Bundle tool to generate a entire workflow in develop environment. Using the Customers and Orders dataset on Databricks
This will include a SQL Data Modelling following the Kimball definitions.
-
Install the Databricks CLI from https://docs.databricks.com/dev-tools/cli/databricks-cli.html
-
Authenticate to your Databricks workspace:
$ databricks configure
-
It is important to create the env file that contain variables such as git informations, username on databricks and so on. Check the jobs yaml in resources and the databricks.yaml in root folder.
-
To deploy a development copy of this project, type:
$ databricks bundle deploy --target dev
(Note that "dev" is the default target, so the
--target
parameter is optional here.)This deploys everything that's defined for this project. For example, the default template would deploy a job called
[dev yourname] example_job
to your workspace. You can find that job by opening your workpace and clicking on Workflows. -
Similarly, to deploy a production copy, type:
$ databricks bundle deploy --target prod
-
To run a job or pipeline, use the "run" command:
$ databricks bundle run
-
Optionally, install developer tools such as the Databricks extension for Visual Studio Code from https://docs.databricks.com/dev-tools/vscode-ext.html.
-
For documentation on the Databricks asset bundles format used for this project, and for CI/CD configuration, see https://docs.databricks.com/dev-tools/bundles/index.html.