Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Reuse the existing DB connection while creating backend #8877

Closed
1 task done
walnutist opened this issue Apr 3, 2024 · 7 comments · Fixed by #9603
Closed
1 task done

feat: Reuse the existing DB connection while creating backend #8877

walnutist opened this issue Apr 3, 2024 · 7 comments · Fixed by #9603
Assignees
Labels
ecosystem External projects or activities feature Features or general enhancements
Milestone

Comments

@walnutist
Copy link

Is your feature request related to a problem?

Currently all db operations are done thru SQLAlchemy/SQLGlot, thus only DBAPI compliant interfaces are exposed. It is required to reuse an existing db connection, e.g., duckdb.connec("mydb.db"), so that all previously registered python function (via duckdb.create_function) to work, also, it would be easier to guarantee the in-memory duckdb to be shared among ibis and non-ibis codes

Describe the solution you'd like

Change the current connect method to accept an existing connection

What version of ibis are you running?

9.0.0.dev568

What backend(s) are you using, if any?

DuckDB, Polars, MySQL

Code of Conduct

  • I agree to follow this project's Code of Conduct
@walnutist walnutist added the feature Features or general enhancements label Apr 3, 2024
@chloeh13q
Copy link
Contributor

Hi @walnutist - thank you for checking out Ibis and filing an issue!

For our understanding - do you mind elaborating on what you're trying to do? Do you have a use case where Ibis does not provide certain functionalities, and therefore you have certain parts of code written without Ibis and certain parts with Ibis?

@cpcloud
Copy link
Member

cpcloud commented Apr 5, 2024

I think what @walnutist might be asking for is the ability to construct a backend from an existing Ibis connection.

@walnutist
Copy link
Author

easier

Exactly. Thanks for your elaboration!

@sh-rp
Copy link

sh-rp commented Jun 20, 2024

Hi @chloeh13q and ibis team, dlt (https://github.com/dlt-hub/dlt) core-dev here. We are exploring an ibis integration. dlt is a python library to easily load data to a bunch of destinations, so whenever a user configures a dlt pipeline, there will be the credentials and code for a working db connection in place. What would be amazing is to just share the connection with an ibis backend and then have the nice interface ibis provides to the dataset on that connection. I have hacked together a prototype here: dlt-hub/dlt#1491 where I basically instantiate a backend and set the .con attribute, but that only works for a few destinations (duckdb, postgres and to a certain extent snowflake) and I think the reason is, that the backends do additional setup stuff in the do_connect (I think that is what it's called) function on the backend. So it would be amazing to be able to give a connection to the backend constructor or this connect method and this method would then do all the stuff it always does except for establishing a new connection.

Our use-case is, that we have our own credentials format and in some cases there are multiple ways of authenticating with a destination and we'd have to write a lot of glue code and tests just to make sure the credentials are passed in a way ibis understands in every case.

@gforsyth
Copy link
Member

Hey @sh-rp -- thanks for getting in touch!

I think we can support this, where we have some class method on the backends like from_dbapi_con and we expect to be handed an open DBABI 2.0 connection that we can then massage accordingly so that Ibis behaves as expected.

Most of the things that we do on class instantiation are easy-enough to apply to an already-open connection.

One thing that we currently do is grab all of the args and kwargs that will be passed to the dbapi con (when Ibis is creating it) and stash those, to enable the reconnect functionality. That bit of functionality would be harder to manage when we're being handed an existing connection. If we don't need reconnect, I think this is something we can put on the roadmap for 10.0

@gforsyth gforsyth added the ecosystem External projects or activities label Jun 24, 2024
@sh-rp
Copy link

sh-rp commented Jun 26, 2024

@gforsyth in my view we do not need reconnect, the connection would be handled by dlt. There is a chance that this will not work well for users that need a connection that is open for a long time, but I think for the first iteration it should be fine. In an ideal scenario, your backends would be built in a way, that additional third party dependencies are only required when no open dbapi connection is handed over. I think this should be possible, since in most cases you'd only need that when establishing the connection yourselves. I may be wrong there through.

I'll just track this issue then and react to it in case you decide to make this change :)

FYI the destinations we currently support are

  • BigQuery
  • Databricks
  • DuckDB / Motherduck
  • MS SQL Server
  • Azure Synapse
  • ClickHouse
  • Filesystem Destinations (includes local fs, s3, azure and gcs) with parquet and jsonl
  • Postgres
  • Redshift
  • Snowflake
  • AWS Athena optionally with iceberg tables
  • Weaviate
  • Qdrant
  • LanceDB (in development)
  • Dremio

We would like to expose a connected ibis backend for each of the destinations listed and supported by Ibis.

@deepyaman
Copy link
Contributor

I have hacked together a prototype here: dlt-hub/dlt#1491 where I basically instantiate a backend and set the .con attribute, but that only works for a few destinations (duckdb, postgres and to a certain extent snowflake) and I think the reason is, that the backends do additional setup stuff in the do_connect (I think that is what it's called) function on the backend. So it would be amazing to be able to give a connection to the backend constructor or this connect method and this method would then do all the stuff it always does except for establishing a new connection.

@sh-rp I can try and take this. @gforsyth If it sounds right to you, I will take the approach you mention—add a class method and just do the post-connect steps from do_connect (probably refactor it out into a _post_connect method on the backend).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ecosystem External projects or activities feature Features or general enhancements
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants