The dbt-spark-livy
adapter allows you to use dbt along with Apache spark-livy and Cloudera Data Platform with Livy server support. This code bases use the dbt-spark project (https://github.com/dbt-labs/dbt-spark), and provides a Livy connectivity support over it.
- Install dbt
- Read the introduction and viewpoint
A docker-compose
environment starts a Spark Thrift server and a Postgres database as a Hive Metastore backend.
Note: dbt-spark now supports Spark 3.1.1 (formerly on Spark 2.x).
Python >= 3.8
dbt-core ~= 1.3.0
pyspark
sqlparams
requests_kerberos
requests-toolbelt
python-decouple
pip install dbt-spark-livy
demo_project:
target: dev
outputs:
dev:
type: spark_livy
method: livy
schema: my_db
host: https://spark-livy-gateway.my.org.com/dbt-spark/cdp-proxy-api/livy_for_spark3/
user: my_user
password: my_pass
- While using livy , in the Livy UI if you notice sessions change state to dead from starting instead of idle, make sure there is a proper mapping for the user in the IDBroker mapping section
- Actions > Manage Access > IDBroker Mappings . Reference
- Also make sure the workload password is set either through UI or CLI. Reference
Please see the original adapter documentation: https://github.com/dbt-labs/dbt-spark and https://docs.getdbt.com/reference/warehouse-profiles/spark-profile