-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching a polars dataframe into parquet fails #1240
Comments
@poldpold thanks for the issue! Will take a look. |
The cache store assumed that every persister took a `path` argument. That is not the case because the savers / loaders wrap external APIs and we decided to not try to create our own abstraction layer around them, and instead mirror them. E.g. polars takes `file`, but pandas takes `path`.
@poldpold if you could try installing my fix and giving it a go please: |
@skrawcz, this works great, thank you! Looking forward to seeing it in an upcoming release! |
it will be out this week. |
* Fixes #1240 The cache store assumed that every persister took a `path` argument. That is not the case because the savers / loaders wrap external APIs and we decided to not try to create our own abstraction layer around them, and instead mirror them. E.g. polars takes `file`, but pandas takes `path`. This means future changes could need to change things here. * Adds tests To catch case with `file` and without `path` or `file`.
Current behavior
When trying to cache a node whose output is a polars DataFrame, an exception is raised.
Stack Traces
Steps to replicate behavior
Write and run a jupyter notebook with the following cells:
Library & System Information
python=3.11.8, sf-hamilton=1.81.0 and 1.83.2, polars=1.10.0
Expected behavior
I expected the node output to be persisted to disk in a parquet format.
The text was updated successfully, but these errors were encountered: