-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(sqlite): defer db connection until needed #3127
Conversation
- to .execute() the same table (or a copy) from multiple threads
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
umm at the very minimum this needs tests
can u show an example why this is needed?
Yes, sorry, this isn't a complete PR, I wanted to submit it early so we could discuss how to clean it up and how to test it. This is needed because the sqlite connection is created in the thread that gets a reference to the table, and if another thread tries to use that table, sqlite/sqlalchemy gets confused during execute and raises an Exception with a message like So this defers the connection creation until it's needed, and also allows the objects to be copied so they can be used in other threads later (new connections will be created for them). This works adequately for my use case. I'll talk with @cpcloud on Monday about what a proper test for this looks like. |
ae490eb
to
b36d41b
Compare
for more information, see https://pre-commit.ci
3a614a3
to
3193412
Compare
3193412
to
7d68a5d
Compare
@@ -56,7 +56,7 @@ class BaseFileBackend(BaseBackend): | |||
|
|||
database_class = FileDatabase | |||
|
|||
def connect(self, path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you cannot rename any of these methods as they r public
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I'm proposing an API change. (See the toplevel comment I just posted.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback The top level API here isn't changing. Every backend still has the same connect
API by way of threading args and options through do_connect
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure it is. this is leaking the db_connect
into the base class where its not needed. you can move the base
changes to the sql layer i think and achieve what you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, do_connect would be required for new backends or those that want to upgrade. I don't think I see the problem, we'd release a new major version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback Unfortunately the thing I need is to detach the object construction from the db connection, so I can copy Expr from any arbitrary db table and have them 'just work'. I think it's generally good practice to separate these two anyway--in my experience the constructor/factory should set initial state and not "do" anything, so it can't fail/raise which makes things more complicated for everyone.
Ibis needs a connection to the data source in order to create expressions which contain reference the connection to their data source. I need to be able to copy (or better, pickle) these expressions and call So this PR proposes a small API change. Currently, The Here's an example of a
and here's the replacement
This is a mechanical transformation and even results in a bit less code. The signature for All backends will eventually need a
I think maybe there should be an API method for invalidating the connection. Or we could make If we can come up with better names than |
- but expressions from different connections to the same DB should be compatible
- add Backend.datasource to identify database apart from connection object
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there, small comment about constructor arguments.
ibis/backends/base/__init__.py
Outdated
Return new client object with saved args/kwargs, having called | ||
.reconnect() on it. | ||
""" | ||
new_backend = self.__class__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of mutating private attributes outside the class, how about taking con_args
and con_kwargs
as constructor arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So do you mean, having a default BaseBackend() constructor that takes the same *args, **kwargs
parameters as its connect()
/do_connect()
methods would take?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a change, let me know if this is what you meant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Over to you @jreback
will look soon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't understand why we have to break the world by renaming connect -> do_connect?
@abc.abstractmethod | ||
def connect(connection_string, **options): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we preserve this for backwards compatibilty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look a few lines above this, it's still there, so you can continue to use connect()
like you always have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh fi that's the case, then just document connect & db_connect a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'm not clear on what's missing, or where this documentation would go. Backends need to override the abstractmethod do_connect
, but that's virtually identical to what they had to do previously with connect
(as shown in this comment in the PR). What else do you need?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also can you add a whatsnew note and describe the change
This looks good to me. @jreback ? |
@cpcloud this appears to have broken ibis-bigquery, but they don't test against master. They are a pretty good proxy for other backends. I am not sure what to do here as there are a lot of breaking changes in the pipelines. |
|
Ah looks like |
I believe that if we remove the |
we shouldnt be constantly breaking downstream like this. |
Yep. I don't think we are at "constantly" yet. The goal is to avoid breakage between major releases. A major will always include at least one breaking change. |
🎉 This PR is included in version 2.1.0 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
Closes #64
Closes #1768