Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(caching): tie lifetime of cached tables to python refs #9477

Merged
merged 15 commits into from
Jul 2, 2024

Conversation

coroa
Copy link
Contributor

@coroa coroa commented Jun 29, 2024

Description of changes

Makes use of weakref.finalizers to release the cached table automatically when the cached table (or more precisely its .op()) is garbage collected.

In [1]: import pandas as pd

In [2]: import ibis

In [3]: con = ibis.duckdb.connect()

In [4]: df = pd.DataFrame([[0, 1], [2, 3]], columns=["foo", "bar"])

In [5]: noncached = con.create_table("tab", df).mutate(foo=2 * tab.foo)

In [6]: cached = noncached.cache()

In [7]: con.tables
Out[7]: 
Tables
------
- ibis_cache_wvrjoptclzhudglqapz4zusnlu
- tab

In [8]: del cached

In [9]: con.tables
Out[9]: 
Tables
------
- tab

If I understand the nodal expression tree structure correctly, then cached.op() is preserved in all dependent expressions. This PR adds a weakref.finalize(cached.op(), ...) (docs) onto cached.op() which releases the physical cached copy from the databases.

The idempotency of .cache() is also preserved by returning the same table reference.

Copy link
Contributor

ACTION NEEDED

Ibis follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message.

Please update your PR title and description to match the specification.

@coroa coroa changed the title feat: Tie lifetime of cached tables to python refs feat: tie lifetime of cached tables to python refs Jun 29, 2024
@coroa
Copy link
Contributor Author

coroa commented Jun 29, 2024

The bidi dependency (for a bidirectional dictionary) is not necessary anymore with the RefCountedCache implementation in this PR, but quite obviously my attempt at removing it was incomplete.

conda/environment.yml Outdated Show resolved Hide resolved
Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging around in the guts here :)

A couple of questions and requests, but I generally like where this PR is going!

ibis/expr/types/relations.py Outdated Show resolved Hide resolved
ibis/common/caching.py Outdated Show resolved Hide resolved
ibis/common/caching.py Outdated Show resolved Hide resolved
ibis/common/caching.py Show resolved Hide resolved
ibis/common/caching.py Outdated Show resolved Hide resolved
ibis/common/caching.py Outdated Show resolved Hide resolved
ibis/expr/types/relations.py Outdated Show resolved Hide resolved
@cpcloud cpcloud changed the title feat: tie lifetime of cached tables to python refs feat(caching): tie lifetime of cached tables to python refs Jun 30, 2024
@cpcloud cpcloud added feature Features or general enhancements ux User experience related issues labels Jun 30, 2024
@cpcloud
Copy link
Member

cpcloud commented Jul 1, 2024

I will clean up the ruff lints in #9489!

@cpcloud cpcloud added this to the 9.2 milestone Jul 1, 2024
@coroa
Copy link
Contributor Author

coroa commented Jul 1, 2024

I will clean up the ruff lints in #9489!

Once that gets merged, i'll rebase ontop

@cpcloud
Copy link
Member

cpcloud commented Jul 1, 2024

@coroa #9489 is merged!

@coroa coroa force-pushed the self-releasing-cached-tables branch from 6d923c9 to 72c91ee Compare July 1, 2024 14:30
@coroa coroa force-pushed the self-releasing-cached-tables branch from 72c91ee to 1f79e36 Compare July 1, 2024 14:43
Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@coroa
Copy link
Contributor Author

coroa commented Jul 1, 2024

Ok, I rebased ontop of main and removed the DeprecationWarning and updated the .cache() doc-string with another sentence on the explicit and implicit cache eviction.

Only things left on my list of todos are:

  • Release notes
  • Check and update user guide

@cpcloud
Copy link
Member

cpcloud commented Jul 1, 2024

Looks like there's maybe a bit of DeprecationWarning still hanging around?

@cpcloud
Copy link
Member

cpcloud commented Jul 1, 2024

A release note will be generated from the commit message, so no need to do anything there.

@coroa
Copy link
Contributor Author

coroa commented Jul 1, 2024

Looks like there's maybe a bit of DeprecationWarning still hanging around?

Ah yes, i forgot about all the tests i had to change :).

@coroa
Copy link
Contributor Author

coroa commented Jul 1, 2024

Couldn't find any user-guide explanations of the cache'ing mechanism. So i guess the doc-string is the best we have atm. And that is up-to-date.

So, from my side this is good to go.

Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jcrist Did you wanna take a look here again?

Copy link
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM! This is something we'd talked about doing internally before, nice to see it handled in such a nice way be a new contributor!


def release(self, name: str) -> None:
# Could be sped up with an inverse dictionary,
# but explicit release is discouraged, anyway
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think explicit release is discouraged. I view this as a way to catch users not being explicit, but if users want to be explicit that's also fine. I'd drop this part of the comment.

raise IbisError(
"Key has already been released. Did you call "
"`.release()` twice on the same expression?"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a no-op, not an error. Python doesn't error for calling close() explicitly multiple times on a file, we shouldn't either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense

ibis/common/caching.py Outdated Show resolved Hide resolved
def _clean_up_cached_table(self, op):
self.drop_table(op.name)
def _clean_up_cached_table(self, name):
self.drop_table(name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this can be deleted to just rely on the base sql backend implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, and works

def _clean_up_cached_table(self, op):
self.drop_table(op.name)
def _clean_up_cached_table(self, name):
self.drop_table(name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to pass force=True to most of these drop_table calls (here and elsewhere) to not error if the table doesn't exist. What we care about is that the table is cleaned up, if some other mechanism already deleted it then I don't think that should be a user-facing error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


def _release(self, key) -> None:
entry = self.cache.pop(key)
self.finalize(entry.name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call may error for a few reasons:

  • Table doesn't exist (I think we should ignore these always, as noted elsewhere).
  • Spurious error
  • Connection is in weird state since python is finalizing

Since most backends will cleanup temp tables automatically (and errors on exit can give a bad impression of a tool), I wonder if want to do something like:

try:
    self.finalize(entry.name)
except Exception:
    # silence all errors if system is shutting down
    if not sys.is_finalizing():
        raise

This way we can still catch bugs in our code since most release calls won't silence everything, but in the case the system is shutting down and e.g. the network is in a weird state we don't see a mountain of failed-to-release-table errors on exit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I added that.

coroa

This comment was marked as duplicate.

Copy link
Contributor Author

@coroa coroa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nice words

@coroa
Copy link
Contributor Author

coroa commented Jul 2, 2024

Unsure what tripped the mssql backend. Unfortunately, have not been able to run those tests locally.

I think I'll need help to address the test failure.

@cpcloud
Copy link
Member

cpcloud commented Jul 2, 2024

One transient error and one XPASS, which is a win!

@cpcloud
Copy link
Member

cpcloud commented Jul 2, 2024

I'll fix this up and merge!

@cpcloud
Copy link
Member

cpcloud commented Jul 2, 2024

Clouds are passing:

…/ibis on  self-releasing-cached-tables is 📦 v9.1.0 via 🐍 v3.10.14 via ❄️   impure (ibis-3.10.14-env)
❯ pytest -m 'bigquery or snowflake' -n 8 --dist loadgroup --snapshot-update -q
bringing up nodes...
.x.x...........................x....s................x..........s...........................x.................xx..........x..x............x..........................x...x........................... [  5%]
............x....................x....................s..................s................................................s......................................................s..x.x.......x...... [ 10%]
x.......x.x..x.........x....x..x.xx....x..........x....................x....................................................x...............................................sssssssssssssssssssss.... [ 16%]
xxxxxxxxxx.x.xx.x.x....x.x.....................................x.............x........x......xx.x.........................x.......x....x.x...........x...x.......................x................... [ 21%]
...x....x..........................x.x....................x........x.......x.x.......x..x......................x..........................xx...............x......x.x..x...x...x......x.............. [ 26%]
.....x.............................x....xx.......x......x..xxxx.xx.x.x.xxx..xx.x...xxxx...x..xx.xxx..x.x..xx...xx.x.x...x......xxx.xxxx.xxx.x...xx.xx.xxxxxx.xx...x.xx.......x.........x.....x....... [ 32%]
xx..........x............................x...x..x..................x....x.....x............x.....x................x..x.....xxx..x.x...xxx.xxxx.xx......xx.x...x..x.....x..............x.............. [ 37%]
..x.x.x.x.............x..x.....x..x.x.......x.x............................x....x..x.x..x...x..x.....x...x..............x...xxxxx....x..................x.........x.......xx..x.x.................... [ 42%]
.....s.............x..............s.x...x....x..............sx..xx.s.x...x..x.......s......x.x......x..........x...................x...........x...........................x..................xx..... [ 48%]
.x.........x.....................x...........................s..........................s....................s..........x................................x..............x.x...x..x...........x...x... [ 53%]
..............xxx.....xx......................x.xx..x......x.x..x...x.x......x............x.x..x.......x..................x.......xx..xx..xx...xxxxxxxxxx...xxx...xxxx.xxxx..x...x.x..x..x.xxx.xx..x. [ 58%]
xxxxx..x.x.x...x..................x..x...x...x....x........x........x.....x.x....x.............x.x.x...xx.....x....x...x.......x...xxxxx................x................x..xx....xx................. [ 64%]
....x...x....x................x.............x.x.....x........x.x.x....................x..........x..xx.....x.......x.x...........x..xx.......x..x................x.....x......xxx...x.x.xxx..xxxxxxxx [ 69%]
xxxx.xxxxx.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx..x......................x.........................x.....................x...x........ [ 74%]
....................................................................x.xxxxxx.xx.x....xx.xx....x..x.......x....................x..............x........x.x.xx...x.x...x.....x....x...xx..............x [ 80%]
..xx....x...xx..x..x.....x.....................x..x.................x.x.....x.......x...........x........x....x..x.x.......x.......x................................................................. [ 85%]
..........................................................................................................................................................................s.......................... [ 90%]
.........x.............................................................................................................................................................................x.x........... [ 96%]
......................................................s...ss......................................x...........................................                                                        [100%]
3097 passed, 39 skipped, 552 xfailed in 840.67s (0:14:00)

@cpcloud cpcloud enabled auto-merge (squash) July 2, 2024 10:19
@cpcloud cpcloud merged commit f51546e into ibis-project:main Jul 2, 2024
89 checks passed
@cpcloud
Copy link
Member

cpcloud commented Jul 2, 2024

@coroa Thanks for pushing this through, great work!

@coroa
Copy link
Contributor Author

coroa commented Jul 2, 2024

Thanks for the quick reviews and comments!

@coroa coroa deleted the self-releasing-cached-tables branch August 8, 2024 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements ux User experience related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants