Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delta Lake connector #11296

Open
21 of 29 tasks
findepi opened this issue Mar 3, 2022 · 8 comments
Open
21 of 29 tasks

Delta Lake connector #11296

findepi opened this issue Mar 3, 2022 · 8 comments
Labels
enhancement New feature or request roadmap Top level issues for major efforts in the project

Comments

@findepi
Copy link
Member

findepi commented Mar 3, 2022

@findepi findepi added enhancement New feature or request roadmap Top level issues for major efforts in the project labels Mar 3, 2022
@findepi
Copy link
Member Author

findepi commented Mar 3, 2022

cc @jirassimok @alexjo2144

This was referenced Mar 3, 2022
@homar
Copy link
Member

homar commented Apr 20, 2022

Based on TODOs in code I created following issues related to Delta Lake connector:

@findepi
Copy link
Member Author

findepi commented Apr 20, 2022

@homar thanks!
i moved the above list into issue description. Feel free to remove checkboxes from your comment (or the list)

@trymzet
Copy link

trymzet commented May 16, 2022

Do I get it right in that vanilla Databricks is not yet supported? This connector requires using the Thrift schema for Hive connection string (IllegalArgumentException: metastoreUri scheme must be thrift), and AFAIK Databricks only exposes the JDBC connection string (eg jdbc:spark://adb-123456789.5.azuredatabricks.net:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/123456789/0427-122644-45iadnd;AuthMech=3;UID=token;PWD=<personal-access-token>). You can only use Thrift if you set up a custom metastore for Databricks.

@findepi
Copy link
Member Author

findepi commented May 16, 2022

You can only use Thrift if you set up a custom metastore for Databricks.

Yes.
Or, use Glue.

AFAIK Databricks only exposes the JDBC connection string (eg jdbc:spark://adb-123456789.5.azuredatabricks.net:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/123456789/0427-122644-45iadnd;AuthMech=3;UID=token;PWD=<personal-access-token>).

We have no plans to connect to Databricks runtime using Databricks JDBC.
This would kill most benefits of this connector.

@alexjo2144
Copy link
Member

alexjo2144 commented May 16, 2022

Here's the Databricks docs for setting up an external HMS or Glue https://docs.databricks.com/data/metastores/index.html. Both of those options are supported.

@trymzet
Copy link

trymzet commented May 17, 2022

@alexjo2144 Thanks. I was researching if we can use dbt with various sources all through Trino (inspired by this video) and it seems that Databricks is doable as well, although integrating directly through dbt-databricks plugin is more straightforward. For future generaitons: using Databricks through dbt-trino plugin requires setting up and maintaining your own Hive instance and creating a global init script to set the config of each cluster to use that Hive. Also, DBFS is not supported with this method.

@PragyaJaiswal
Copy link

PragyaJaiswal commented Apr 29, 2024

I would think these should change things: https://www.databricks.com/blog/extending-databricks-unity-catalog-open-apache-hive-metastore-api, and the Trino versions 440 and above seems to have support for integrating with the Databricks HMS API: https://trino.io/docs/current/object-storage/metastores.html#thrift-metastore-configuration-properties. Does that look promising?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request roadmap Top level issues for major efforts in the project
Development

No branches or pull requests

5 participants