Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sqlite): defer db connection until needed #3127

Merged
merged 19 commits into from
Dec 18, 2021

Conversation

saulpw
Copy link
Contributor

@saulpw saulpw commented Nov 20, 2021

  • to .execute() the same table (or a copy) from multiple threads

Closes #64
Closes #1768

- to .execute() the same table (or a copy) from multiple threads
@pep8speaks
Copy link

pep8speaks commented Nov 20, 2021

Hello @saulpw! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-12-17 17:20:30 UTC

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm at the very minimum this needs tests

can u show an example why this is needed?

@github-actions
Copy link
Contributor

github-actions bot commented Nov 20, 2021

Unit Test Results

       38 files         38 suites   1h 15m 38s ⏱️
  7 241 tests   5 453 ✔️ 1 788 💤 0
29 251 runs  22 173 ✔️ 7 078 💤 0

Results for commit 5467afa.

♻️ This comment has been updated with latest results.

@saulpw
Copy link
Contributor Author

saulpw commented Nov 20, 2021

umm at the very minimum this needs tests

can u show an example why this is needed?

Yes, sorry, this isn't a complete PR, I wanted to submit it early so we could discuss how to clean it up and how to test it.

This is needed because the sqlite connection is created in the thread that gets a reference to the table, and if another thread tries to use that table, sqlite/sqlalchemy gets confused during execute and raises an Exception with a message like no such table: base.dept. ("base" is the default database_name). Doing both actions in the same thread works fine, however.

So this defers the connection creation until it's needed, and also allows the objects to be copied so they can be used in other threads later (new connections will be created for them). This works adequately for my use case.

I'll talk with @cpcloud on Monday about what a proper test for this looks like.

@saulpw saulpw marked this pull request as draft November 20, 2021 03:14
@saulpw saulpw force-pushed the pw21_lazyconnect branch 2 times, most recently from 3a614a3 to 3193412 Compare November 24, 2021 01:12
ibis/backends/base/__init__.py Outdated Show resolved Hide resolved
@@ -56,7 +56,7 @@ class BaseFileBackend(BaseBackend):

database_class = FileDatabase

def connect(self, path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you cannot rename any of these methods as they r public

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I'm proposing an API change. (See the toplevel comment I just posted.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback The top level API here isn't changing. Every backend still has the same connect API by way of threading args and options through do_connect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure it is. this is leaking the db_connect into the base class where its not needed. you can move the base changes to the sql layer i think and achieve what you want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, do_connect would be required for new backends or those that want to upgrade. I don't think I see the problem, we'd release a new major version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback Unfortunately the thing I need is to detach the object construction from the db connection, so I can copy Expr from any arbitrary db table and have them 'just work'. I think it's generally good practice to separate these two anyway--in my experience the constructor/factory should set initial state and not "do" anything, so it can't fail/raise which makes things more complicated for everyone.

@saulpw
Copy link
Contributor Author

saulpw commented Nov 24, 2021

Ibis needs a connection to the data source in order to create expressions which contain reference the connection to their data source. I need to be able to copy (or better, pickle) these expressions and call .execute() on those copies from different threads.

So this PR proposes a small API change. Currently, connect() does two things: it creates an object, and it connects the object to the source. With this PR, connect() still creates the object, and the actual source-connecting is moved to do_connect(). This way, Expr derived from these Backends can be copy()ed and each copy can make its own connection on demand.

The connect() constructor turns out to be boilerplate and can be given in BaseBackend so backends don't need to override it anymore. But every backend will need to override do_connect(), which just needs to do the connecting (not the actual object creation).

Here's an example of a connect() as it would be currently:

    def connect(self, ...):
        new_backend = self.__class__()
        new_backend._context = foodb.connect(...)
        return new_backend

and here's the replacement do_connect():

    def do_connect(self, ...):
        self._context = foodb.connect(...)

This is a mechanical transformation and even results in a bit less code. The signature for do_connect should be identical; the args and kwargs are saved and passed through verbatim.

All backends will eventually need a __getstate__ that does not include unreusable connection context. Most Backends can rely on the parent BaseBackend.__getstate__ and possibly include some conveniences that connections can share, like:

    def __getstate__(self):
        r = super().__getstate__()
        r.update(dict(_session_token=self._session_token, _context=None))
        return r

I think maybe there should be an API method for invalidating the connection. Or we could make do_connect a forced invalidation; but maybe it should be allowed to be a no-op in case it wants to always reuse the existing connection?

If we can come up with better names than connect and do_connect, we can use those instead.

@saulpw saulpw marked this pull request as ready for review December 6, 2021 22:06
Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there, small comment about constructor arguments.

Return new client object with saved args/kwargs, having called
.reconnect() on it.
"""
new_backend = self.__class__()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of mutating private attributes outside the class, how about taking con_args and con_kwargs as constructor arguments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do you mean, having a default BaseBackend() constructor that takes the same *args, **kwargs parameters as its connect()/do_connect() methods would take?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a change, let me know if this is what you meant.

ibis/tests/expr/test_table.py Show resolved Hide resolved
Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Over to you @jreback

@jreback
Copy link
Contributor

jreback commented Dec 11, 2021

will look soon

@cpcloud cpcloud added breaking change Changes that introduce an API break at any level feature Features or general enhancements labels Dec 15, 2021
@cpcloud cpcloud requested a review from jreback December 16, 2021 17:34
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't understand why we have to break the world by renaming connect -> do_connect?

ibis/backends/base/__init__.py Outdated Show resolved Hide resolved
ibis/backends/base/__init__.py Outdated Show resolved Hide resolved
ibis/backends/base/__init__.py Outdated Show resolved Hide resolved
@abc.abstractmethod
def connect(connection_string, **options):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we preserve this for backwards compatibilty

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look a few lines above this, it's still there, so you can continue to use connect() like you always have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh fi that's the case, then just document connect & db_connect a bit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'm not clear on what's missing, or where this documentation would go. Backends need to override the abstractmethod do_connect, but that's virtually identical to what they had to do previously with connect (as shown in this comment in the PR). What else do you need?

ibis/backends/base/sql/alchemy/__init__.py Outdated Show resolved Hide resolved
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also can you add a whatsnew note and describe the change

@cpcloud
Copy link
Member

cpcloud commented Dec 17, 2021

also can you add a whatsnew note and describe the change

@jreback On a separate note, can you take a look at #2924? This moves us much closer to automate release notes based on commit messages so we don't have to keep asking people to write release notes.

@cpcloud
Copy link
Member

cpcloud commented Dec 17, 2021

This looks good to me. @jreback ?

@cpcloud cpcloud merged commit 5467afa into ibis-project:master Dec 18, 2021
@jreback
Copy link
Contributor

jreback commented Jan 4, 2022

@cpcloud this appears to have broken ibis-bigquery, but they don't test against master. They are a pretty good proxy for other backends.

I am not sure what to do here as there are a lot of breaking changes in the pipelines.

@cpcloud
Copy link
Member

cpcloud commented Jan 4, 2022

ibis-bigquery has an unrealistic requirements.txt with unbounded versions for every dependency. This is bound to break at some point and certainly there's no guarantee that master is compatible with dependents.

@cpcloud
Copy link
Member

cpcloud commented Jan 4, 2022

Ah looks like setup.py has constraints.

@cpcloud
Copy link
Member

cpcloud commented Jan 4, 2022

I believe that if we remove the abstractmethod decorator we can avoid the breakage, and reintroduce it in 3.0.

@jreback
Copy link
Contributor

jreback commented Jan 4, 2022

we shouldnt be constantly breaking downstream like this.

@cpcloud
Copy link
Member

cpcloud commented Jan 4, 2022

Yep. I don't think we are at "constantly" yet. The goal is to avoid breakage between major releases. A major will always include at least one breaking change.

@ibis-project-bot
Copy link
Contributor

🎉 This PR is included in version 2.1.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Changes that introduce an API break at any level feature Features or general enhancements
Projects
None yet
5 participants