-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[data] Add iterator batch_format=None support, which will yield batches in the current batch format with zero copies #33562
Conversation
@@ -379,7 +379,7 @@ def map_batches( | |||
*, | |||
batch_size: Optional[Union[int, Literal["default"]]] = "default", | |||
compute: Optional[Union[str, ComputeStrategy]] = None, | |||
batch_format: Literal["default", "pandas", "pyarrow", "numpy"] = "default", | |||
batch_format: Optional[str] = "default", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep it as Optional[Literal]
for full explicitness of supported batch formats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like that is hard to maintain (per the inconsistencies in the code already), so opted to go unify on the shorter signature.
lets also keep the documentation changes from https://github.com/ray-project/ray/pull/33536/files#diff-988f3832ac94d085daf61260175e2580920ebd1521dc760f58b426b94379d5b7L235? |
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Done |
@@ -540,7 +540,9 @@ def map_batches( | |||
(promotes tables to Pandas and tensors to NumPy), ``"pandas"`` to select | |||
``pandas.DataFrame``, "pyarrow" to select ``pyarrow.Table``, or | |||
``"numpy"`` to select ``numpy.ndarray`` for tensor datasets and | |||
``Dict[str, numpy.ndarray]`` for tabular datasets. Default is "default". | |||
``Dict[str, numpy.ndarray]`` for tabular datasets, or None to return | |||
the underlying block exactly as is with no additional formatting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I like batch_size=None
a good bit more than adding another literal string!
…es in the current batch format with zero copies (ray-project#33562) This PR is a cleanup of ray-project#33536 It uses "None" instead of "zero-copy" as a batch format, since None has a similar meaning for batch_size, where it means a system-chosen batch size. Here "None" also means the system chosen optimal batch format. Signed-off-by: elliottower <elliot@elliottower.com>
…project#324… (ray-project#33601) The failure in rllib should have been fixed by ray-project#33562 Verified with `python -m pytest rllib/core/learner/torch/tests/test_torch_learner.py::TestLearner::test_end_to_end_update`. Signed-off-by: elliottower <elliot@elliottower.com>
…es in the current batch format with zero copies (ray-project#33562) This PR is a cleanup of ray-project#33536 It uses "None" instead of "zero-copy" as a batch format, since None has a similar meaning for batch_size, where it means a system-chosen batch size. Here "None" also means the system chosen optimal batch format. Signed-off-by: Jack He <jackhe2345@gmail.com>
…project#324… (ray-project#33601) The failure in rllib should have been fixed by ray-project#33562 Verified with `python -m pytest rllib/core/learner/torch/tests/test_torch_learner.py::TestLearner::test_end_to_end_update`. Signed-off-by: Jack He <jackhe2345@gmail.com>
Why are these changes needed?
This PR is a cleanup of #33536
It uses "None" instead of "zero-copy" as a batch format, since None has a similar meaning for batch_size, where it means a system-chosen batch size. Here "None" also means the system chosen optimal batch format.