-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] Update ExecutionPlan.execute_to_iterator()
to return RefBundles
instead of (Block, BlockMetadata)
#46575
[Data] Update ExecutionPlan.execute_to_iterator()
to return RefBundles
instead of (Block, BlockMetadata)
#46575
Conversation
Signed-off-by: sjl <sjl@anyscale.com>
Signed-off-by: sjl <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
…01-count Signed-off-by: sjl <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
@@ -21,7 +21,7 @@ py_library( | |||
py_test_module_list( | |||
files = glob(["tests/block_batching/test_*.py"]), | |||
size = "medium", | |||
tags = ["team:ml", "exclusive"], | |||
tags = ["team:data", "exclusive"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
found that the tests were not being run under data tests, so updated the tag here.
python/ray/data/_internal/plan.py
Outdated
elif self.is_read_only(): | ||
# For consistency with the previous implementation, we fetch the schema if | ||
# the plan is read-only even if `fetch_if_missing` is False. | ||
blocks_with_metadata, _, _ = self.execute_to_iterator() | ||
# blocks_with_metadata, _, _ = self.execute_to_iterator() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scottjlee to remove before merging
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm 🚀
for block_ref_and_md in next_ref_bundle.blocks: | ||
sliding_window.append(block_ref_and_md) | ||
current_window_size += block_ref_and_md[1].num_rows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for block_ref_and_md in next_ref_bundle.blocks: | |
sliding_window.append(block_ref_and_md) | |
current_window_size += block_ref_and_md[1].num_rows | |
sliding_window.extend(next_ref_bundle.blocks): | |
current_window_size += next_ref_bundle.num_rows() |
Signed-off-by: Scott Lee <sjl@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
def _bundle_to_block_md_iterator( | ||
ref_bundles: Iterator[RefBundle], | ||
) -> Iterator[Tuple[ObjectRef[Block], BlockMetadata]]: | ||
"""Convert an iterator of RefBundles to an iterator of | ||
`(Block object reference, corresponding BlockMetadata)`.""" | ||
for ref_bundle in ref_bundles: | ||
for block_ref, metadata in ref_bundle.blocks: | ||
yield block_ref, metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't called anywhere. Let's remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, removed.
Signed-off-by: Scott Lee <sjl@anyscale.com>
Why are these changes needed?
Stacked on:
Dataset.get_internal_block_refs()
#46455Followup to #46369 and #46455.
Update
ExecutionPlan.execute_to_iterator()
to returnRefBundles
instead of(Block, BlockMetadata)
, to unify the logic betweenRefBundle
s andBlock
s. Also refactor theiter_batches()
code path accordingly to handleRefBundle
s instead of rawBlock
andBlockMetadata
.Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.