-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(caching): tie lifetime of cached tables to python refs #9477
feat(caching): tie lifetime of cached tables to python refs #9477
Conversation
ACTION NEEDED Ibis follows the Conventional Commits specification for release automation. The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for digging around in the guts here :)
A couple of questions and requests, but I generally like where this PR is going!
I will clean up the |
Once that gets merged, i'll rebase ontop |
6d923c9
to
72c91ee
Compare
This reverts commit 980f37b.
Co-authored-by: Phillip Cloud <417981+cpcloud@users.noreply.github.com>
72c91ee
to
1f79e36
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Ok, I rebased ontop of Only things left on my list of todos are:
|
Looks like there's maybe a bit of |
A release note will be generated from the commit message, so no need to do anything there. |
Ah yes, i forgot about all the tests i had to change :). |
Couldn't find any user-guide explanations of the cache'ing mechanism. So i guess the doc-string is the best we have atm. And that is up-to-date. So, from my side this is good to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@jcrist Did you wanna take a look here again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM! This is something we'd talked about doing internally before, nice to see it handled in such a nice way be a new contributor!
ibis/common/caching.py
Outdated
|
||
def release(self, name: str) -> None: | ||
# Could be sped up with an inverse dictionary, | ||
# but explicit release is discouraged, anyway |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think explicit release is discouraged. I view this as a way to catch users not being explicit, but if users want to be explicit that's also fine. I'd drop this part of the comment.
ibis/common/caching.py
Outdated
raise IbisError( | ||
"Key has already been released. Did you call " | ||
"`.release()` twice on the same expression?" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be a no-op, not an error. Python doesn't error for calling close()
explicitly multiple times on a file, we shouldn't either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense
ibis/backends/duckdb/__init__.py
Outdated
def _clean_up_cached_table(self, op): | ||
self.drop_table(op.name) | ||
def _clean_up_cached_table(self, name): | ||
self.drop_table(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this can be deleted to just rely on the base sql
backend implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, and works
ibis/backends/sql/__init__.py
Outdated
def _clean_up_cached_table(self, op): | ||
self.drop_table(op.name) | ||
def _clean_up_cached_table(self, name): | ||
self.drop_table(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to pass force=True
to most of these drop_table
calls (here and elsewhere) to not error if the table doesn't exist. What we care about is that the table is cleaned up, if some other mechanism already deleted it then I don't think that should be a user-facing error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
ibis/common/caching.py
Outdated
|
||
def _release(self, key) -> None: | ||
entry = self.cache.pop(key) | ||
self.finalize(entry.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This call may error for a few reasons:
- Table doesn't exist (I think we should ignore these always, as noted elsewhere).
- Spurious error
- Connection is in weird state since python is finalizing
Since most backends will cleanup temp tables automatically (and errors on exit can give a bad impression of a tool), I wonder if want to do something like:
try:
self.finalize(entry.name)
except Exception:
# silence all errors if system is shutting down
if not sys.is_finalizing():
raise
This way we can still catch bugs in our code since most release calls won't silence everything, but in the case the system is shutting down and e.g. the network is in a weird state we don't see a mountain of failed-to-release-table errors on exit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I added that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the nice words
Unsure what tripped the mssql backend. Unfortunately, have not been able to run those tests locally. I think I'll need help to address the test failure. |
One transient error and one XPASS, which is a win! |
I'll fix this up and merge! |
Clouds are passing:
|
@coroa Thanks for pushing this through, great work! |
Thanks for the quick reviews and comments! |
Description of changes
Makes use of
weakref.finalizer
s to release the cached table automatically when the cached table (or more precisely its.op()
) is garbage collected.If I understand the nodal expression tree structure correctly, then
cached.op()
is preserved in all dependent expressions. This PR adds aweakref.finalize(cached.op(), ...)
(docs) ontocached.op()
which releases the physical cached copy from the databases.The idempotency of
.cache()
is also preserved by returning the same table reference.