-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Datasets] Allow read_binary_files(output_arrow_format=True) to return Arrow format #33780
Conversation
Signed-off-by: Cheng Su <scnju13@gmail.com>
Signed-off-by: Cheng Su <scnju13@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also do this for other non-Arrow datasources such as from_items
, range
, read_text
?
(Actually, remind me why this couldn't be a 100% backwards compatible change? Is the breakage specific to read_binary_files, or would it apply to from_items/range/read_text as well?)
So it looks to me if we want to do for other datasources, we can also include
#32809 (comment) is the reason why we cannot keep 100% backwards compatibility.
So this breakage should apply to all datasources. |
I see, so any multi column outputs would be a breaking change. It seems we could probably support from_items and read_text though since those are single column. I think the ideal outcome would be to deprecate SimpleBlock entirely in favor of single column Arrow tables, so any change in this direction sounds good to me. |
Yes, it's a TODO on team's list. |
btw
And more importantly it may cause users churning before we even know about it. |
…n Arrow format (ray-project#33780) Signed-off-by: elliottower <elliot@elliottower.com>
…n Arrow format (ray-project#33780) Signed-off-by: Jack He <jackhe2345@gmail.com>
Why are these changes needed?
This is a reproposal of #32809, to allow
read_binary_files
to return Arrow format, by adding a parameteroutput_arrow_format
. Default is false, to keep backward compatiblity. Print a warning ifoutput_arrow_format
is false. A future release will flip the bit to set this parameter to true default.Related issue number
Closes #32373 .
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.