-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] Deprecate Dataset.get_internal_block_refs()
#46455
[Data] Deprecate Dataset.get_internal_block_refs()
#46455
Conversation
Signed-off-by: sjl <sjl@anyscale.com>
Signed-off-by: sjl <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
…01-count Signed-off-by: sjl <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
@@ -59,7 +59,7 @@ def __setattr__(self, key, value): | |||
object.__setattr__(self, key, value) | |||
|
|||
@property | |||
def block_refs(self) -> List[BlockMetadata]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix incorrect typehint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't have a lot of context here but the code changes seem good. One small nit on the comment.
def _ref_bundles_iterator_to_block_refs_list( | ||
ref_bundles: Iterator[RefBundle], | ||
) -> List[ObjectRef[Block]]: | ||
"""Convert an iterator of RefBundles to a list of object references to Blocks.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Change to "Convert an iterator of RefBundles to a list of Block object references." to avoid the double to, for a sec thought this was a double transformation.
python/ray/data/tests/test_zip.py
Outdated
@@ -69,7 +69,8 @@ def test_zip_different_num_blocks_split_smallest( | |||
override_num_blocks=num_blocks2, | |||
) | |||
ds = ds1.zip(ds2).materialize() | |||
num_blocks = len(ds.get_internal_block_refs()) | |||
bundles = ds.iter_internal_ref_bundles() | |||
num_blocks = sum([len(b.block_refs) for b in bundles]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_blocks = sum([len(b.block_refs) for b in bundles]) | |
num_blocks = sum(len(b.block_refs) for b in bundles) |
assert nblocks == 1, nblocks | ||
ctx.target_max_block_size = 2_000_000 | ||
nblocks = len(ds2.map_batches(lambda x: x, batch_size=16).get_internal_block_refs()) | ||
bundles = ds2.map_batches(lambda x: x, batch_size=16).iter_internal_ref_bundles() | ||
nblocks = sum([len(b.block_refs) for b in bundles]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Here and elsewhere
nblocks = sum([len(b.block_refs) for b in bundles]) | |
nblocks = sum(len(b.block_refs) for b in bundles) |
Signed-off-by: Scott Lee <sjl@anyscale.com>
…dles` instead of `(Block, BlockMetadata)` (#46575) Followup to #46369 and #46455. Update `ExecutionPlan.execute_to_iterator()` to return `RefBundles` instead of `(Block, BlockMetadata)`, to unify the logic between `RefBundle`s and `Block`s. Also refactor the `iter_batches()` code path accordingly to handle `RefBundle`s instead of raw `Block` and `BlockMetadata`. Signed-off-by: sjl <sjl@anyscale.com> Signed-off-by: Scott Lee <sjl@anyscale.com>
Why are these changes needed?
Stacked on: #46369 (merged)Replaces
Dataset.get_internal_block_refs()
usages withDataset.iter_internal_ref_bundles()
, and marks the method as deprecated.Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.