-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Split output into multiple files #466
Labels
enhancement
New feature or request
Comments
This is likely necessary to make maximal use of partitioned execution. |
I wrote up a Python Client proposal design doc here: https://docs.google.com/document/d/1CHTiyLDD52FpwSI-SEhqft9HT1bYB-2WFTrCNrxqC0w/edit?usp=sharing |
github-merge-queue bot
pushed a commit
that referenced
this issue
Jul 7, 2023
Updates the Python client to support multiple file on output. Writing the tests for this proved difficult so I went with manually testing: a single result ✅ and multiple results ✅
My latest PR (#495) should write multiple files. @kevinjnguyen once that goes in, would you be able to verify everything is working with the python client support? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Summary
Rather than producing a single Parquet file containing the entire result set, we should split the results into files.
There are two reasons -- separate partitions should be able to write separate files and large results should be able to roll into multiple files, allowing the index columns to be written out and dropped, etc.
The API already supports this, but the Python client (and other places) likely don't have all the plumbing in place.
For an initial pass, it is likely OK to have the Python client download all files and combine them to a single data frame, but this can (and should) evolve over time to allow paging over the files (eg., fetch the first file and turn that into a data frame) and/or streaming support (fetch files as they are available), etc.
The text was updated successfully, but these errors were encountered: