-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(output-formats): add support for to_parquet_dir #9781
Conversation
ACTION NEEDED Ibis follows the Conventional Commits specification for release automation. The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
Co-authored-by: Phillip Cloud <417981+cpcloud@users.noreply.github.com>
So I'm forcing the test to write more than one file, and I have assertion errors because of sorting problems. But it looks like I can't provide I can do the sort by columns in the test directly (see latest push), unless there is a better way to approach this. |
Co-authored-by: Phillip Cloud <417981+cpcloud@users.noreply.github.com>
Co-authored-by: Phillip Cloud <417981+cpcloud@users.noreply.github.com>
Co-authored-by: Phillip Cloud <417981+cpcloud@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for adding this!
Let's run the clouds to make sure this works there too! |
It looks like the cloud ones were skipped, not sure why. |
Oh, probably because you can't have two PRs running them simultaneously, it's annoying. |
I will run them locally. |
Co-authored-by: Gil Forsyth <gforsyth@users.noreply.github.com>
Co-authored-by: Gil Forsyth <gforsyth@users.noreply.github.com>
Clouds look good
|
Initial implementation to add support to write parquet files to a directory as a single file or writing each batch to a different file.
Partially resolves #8584 (comment) (it doesn't cover writing csv to dirs)
TODOs:
partitioned_by
which is already supported viato_parquet
, unless I'm missing something.limit=
from pyspark, not sure why is there. Looks like this was a request forto_delta
(see bug(pyspark): export functions don't take param or limit #8898), and it was replicated forto_parquet_dir
ibis/expr/types/core.py
the methodto_parquet
in theclass Expr
doesn't pass through theparams
?