feat: add databricks support #9248

ruiyang2015 · 2024-05-24T17:52:50Z

Which new backend would you like to see in Ibis?

I would like to see databricks backend support.

Code of Conduct

I agree to follow this project's Code of Conduct

cpcloud · 2024-05-29T17:13:31Z

@ruiyang2015 Can you clarify a bit what "databricks" means here? Is that databricks cloud, databricks connect, or something else?

ruiyang2015 · 2024-05-29T17:20:08Z

@ruiyang2015 Can you clarify a bit what "databricks" means here? Is that databricks cloud, databricks connect, or something else?

We use databricks connect and databricks SQL endpoints.

Kilo59 · 2024-08-02T16:34:07Z

https://docs.databricks.com/en/sql/language-manual/index.html

https://docs.databricks.com/en/dev-tools/python-sql-connector.html

cpcloud · 2024-08-02T16:55:07Z

@Kilo59 Thanks for the links! The databricks DB-API looks pretty solid. They even support fetching query results as arrow tables.

techdebtcreator · 2024-08-24T00:19:16Z

+1 for this since my company mainly uses Databricks SQL Warehouse via the databricks-sql-connector package. Unless I missed something, I'm currently only able to connect to Databricks clusters (through the Ibis PySpark backend in conjunction with the databricks-connect package).

nrlugg · 2024-08-26T04:53:27Z

For reference, there is also the databricks-sdk package which could also be used for querying Databricks SQL Warehouses using the statement_execution submodule.

This package is particularly interesting because, if you use format=Format.ARROW_STREAM and disposition=Disposition.EXTERNAL_LINKS, it allows streaming chunks of serialized arrow tables (i.e., arrow IPC format), and this could potentially be used for processing tables which are larger than memory and/or be read using async or multi-threading to stream the data faster.

Also, the results of the executed query are stored temporarily in cloud storage which means the urls of executed query chunks could be cached and reused without having to execute the query again (if the query doesn't change).

Also also, pyarrow is not pinned to any particular version (unlike databricks-sql-connector where it is) which makes dependency management a less restrictive.

hershelm · 2024-09-12T16:26:59Z

+1 would be great to see native support for databricks

anyone in the thread, may find this blog post useful: https://posit.co/blog/databases-with-posit/

see the section on "Databricks" and the "Python" tab

cpcloud · 2024-09-20T12:07:16Z

We will be tackling this for the 10.0 release, stay tuned!

ruiyang2015 added feature Features or general enhancements new backend PRs or issues related to adding new backends labels May 24, 2024

gforsyth mentioned this issue Aug 26, 2024

chore(duckdb): update alternative method for in-memory data registration #9930

Merged

cpcloud added this to the 10.0 milestone Sep 20, 2024

cpcloud self-assigned this Sep 20, 2024

cpcloud linked a pull request Sep 25, 2024 that will close this issue

feat(databricks): add the databricks backend #10223

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add databricks support #9248

feat: add databricks support #9248

ruiyang2015 commented May 24, 2024

cpcloud commented May 29, 2024

ruiyang2015 commented May 29, 2024

Kilo59 commented Aug 2, 2024 •

edited

Loading

cpcloud commented Aug 2, 2024

techdebtcreator commented Aug 24, 2024

nrlugg commented Aug 26, 2024 •

edited

Loading

hershelm commented Sep 12, 2024

cpcloud commented Sep 20, 2024

feat: add databricks support #9248

feat: add databricks support #9248

Comments

ruiyang2015 commented May 24, 2024

Which new backend would you like to see in Ibis?

Code of Conduct

cpcloud commented May 29, 2024

ruiyang2015 commented May 29, 2024

Kilo59 commented Aug 2, 2024 • edited Loading

cpcloud commented Aug 2, 2024

techdebtcreator commented Aug 24, 2024

nrlugg commented Aug 26, 2024 • edited Loading

hershelm commented Sep 12, 2024

cpcloud commented Sep 20, 2024

Kilo59 commented Aug 2, 2024 •

edited

Loading

nrlugg commented Aug 26, 2024 •

edited

Loading