Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data] Update ExecutionPlan.execute_to_iterator() to return RefBundles instead of (Block, BlockMetadata) #46575

Merged
merged 41 commits into from
Jul 16, 2024

Conversation

scottjlee
Copy link
Contributor

@scottjlee scottjlee commented Jul 11, 2024

Why are these changes needed?

Stacked on:

Followup to #46369 and #46455.
Update ExecutionPlan.execute_to_iterator() to return RefBundles instead of (Block, BlockMetadata), to unify the logic between RefBundles and Blocks. Also refactor the iter_batches() code path accordingly to handle RefBundles instead of raw Block and BlockMetadata.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

sjl and others added 30 commits July 1, 2024 23:23
Signed-off-by: sjl <sjl@anyscale.com>
Signed-off-by: sjl <sjl@anyscale.com>
Signed-off-by: sjl <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
fix
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: sjl <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: sjl <sjl@anyscale.com>
…01-count

Signed-off-by: sjl <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Scott Lee added 3 commits July 11, 2024 15:42
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
@@ -21,7 +21,7 @@ py_library(
py_test_module_list(
files = glob(["tests/block_batching/test_*.py"]),
size = "medium",
tags = ["team:ml", "exclusive"],
tags = ["team:data", "exclusive"],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found that the tests were not being run under data tests, so updated the tag here.

Signed-off-by: Scott Lee <sjl@anyscale.com>
elif self.is_read_only():
# For consistency with the previous implementation, we fetch the schema if
# the plan is read-only even if `fetch_if_missing` is False.
blocks_with_metadata, _, _ = self.execute_to_iterator()
# blocks_with_metadata, _, _ = self.execute_to_iterator()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scottjlee to remove before merging

Scott Lee added 4 commits July 12, 2024 01:17
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Copy link
Contributor

@omatthew98 omatthew98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 🚀

Comment on lines 276 to 278
for block_ref_and_md in next_ref_bundle.blocks:
sliding_window.append(block_ref_and_md)
current_window_size += block_ref_and_md[1].num_rows
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for block_ref_and_md in next_ref_bundle.blocks:
sliding_window.append(block_ref_and_md)
current_window_size += block_ref_and_md[1].num_rows
sliding_window.extend(next_ref_bundle.blocks):
current_window_size += next_ref_bundle.num_rows()

Scott Lee added 2 commits July 15, 2024 11:47
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
@scottjlee scottjlee added the go add ONLY when ready to merge, run all tests label Jul 15, 2024
Copy link
Member

@bveeramani bveeramani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

Comment on lines 139 to 146
def _bundle_to_block_md_iterator(
ref_bundles: Iterator[RefBundle],
) -> Iterator[Tuple[ObjectRef[Block], BlockMetadata]]:
"""Convert an iterator of RefBundles to an iterator of
`(Block object reference, corresponding BlockMetadata)`."""
for ref_bundle in ref_bundles:
for block_ref, metadata in ref_bundle.blocks:
yield block_ref, metadata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't called anywhere. Let's remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, removed.

Signed-off-by: Scott Lee <sjl@anyscale.com>
@bveeramani bveeramani merged commit 1b0af29 into ray-project:master Jul 16, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants