[Streaming Generator] Make it compatible with wait #36071

rkooo567 · 2023-06-05T14:35:32Z

Why are these changes needed?

This PR makes the streaming generator compatible with ray.wait.

The semantic is as follows;

def f():
    for _ in range(3):
        yield 1
generator = f.options(num_returns="streaming").remote()
# The generator will be in ready if the next reference is available. Otherwise it is in unready.
# This should work with all other options from ray.wait (including fetch_local=True/False)
ready, unready = ray.wait([generator])

# if the generator's next ref is not ready in 0.1 second, it will be in unready.
# otherwise, it is in ready
ready, unready = ray.wait([generator], timeout=0.1)

# If the generator's next ref is available, it is considered as 1 return
# In this case, this will return if both generator and ref is ready.
ready, unready = ray.wait([generator, ref], num_returns=2)

# if the generator's next ref is available, it will fetch the object to the local node
ready, unready = ray.wait([generator, ref], fetch_local=True)

From the previous PR #36070, we are now able to peek the object reference, and the peeked object is guaranteed to be resolved. We can always peek the next object from the generator and wait on that reference to make the generator compatible to ray.wait.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: SangBin Cho <rkooo567@gmail.com>

This reverts commit 122b705. Signed-off-by: SangBin Cho <rkooo567@gmail.com>

This reverts commit 05f468a. Signed-off-by: SangBin Cho <rkooo567@gmail.com>

Signed-off-by: SangBin Cho <rkooo567@gmail.com>

…-generator-wait

Signed-off-by: SangBin Cho <rkooo567@gmail.com>

…ys met (ray-project#36352) Today, the number of initial blocks of a dataset is limited to the number of input files of the datasource, regardless of the requested parallelism. This is problematic as it means to increase the number of blocks requires a `repartition()` call, which is not always practical in the streaming setting. This PR inserts a streaming SplitBlocks operator that is fused with read tasks in this case to allow for arbitrarily high requested parallelism (up to number of individual records) without needing a blocking repartition. Before: ``` ray.data.read_parquet([list, of, 100, parquet, files], parallelism=2000) # -> num_blocks = 100 ``` After: ``` ray.data.read_parquet([list, of, 100, parquet, files], parallelism=2000) # -> num_blocks = 2000 ``` Limitations: - Until ray-project#36071 merges and is integrated with Ray Data, downstream operators of the read may still block until the entire file is read, even if the read would produce multiple blocks. - The SplitBlocks operator cannot be fused with downstream Map stages, since it is changing the physical partitioning of the stream. If we fused it, then the parallelism increase would not be realized as we could not split the read output to multiple processes. Signed-off-by: e428265 <arvind.chandramouli@lmco.com>

This PR makes the streaming generator compatible with ray.wait. The semantic is as follows; def f(): for _ in range(3): yield 1 generator = f.options(num_returns="streaming").remote() # The generator will be in ready if the next reference is available. Otherwise it is in unready. # This should work with all other options from ray.wait (including fetch_local=True/False) ready, unready = ray.wait([generator]) # if the generator's next ref is not ready in 0.1 second, it will be in unready. # otherwise, it is in ready ready, unready = ray.wait([generator], timeout=0.1) # If the generator's next ref is available, it is considered as 1 return # In this case, this will return if both generator and ref is ready. ready, unready = ray.wait([generator, ref], num_returns=2) # if the generator's next ref is available, it will fetch the object to the local node ready, unready = ray.wait([generator, ref], fetch_local=True) From the previous PR ray-project#36070, we are now able to peek the object reference, and the peeked object is guaranteed to be resolved. We can always peek the next object from the generator and wait on that reference to make the generator compatible to ray.wait. Signed-off-by: e428265 <arvind.chandramouli@lmco.com>

rkooo567 added 30 commits May 12, 2023 06:19

initial version

452ed1f

Signed-off-by: SangBin Cho <rkooo567@gmail.com>

in progress.

3ebe327

Signed-off-by: SangBin Cho <rkooo567@gmail.com>

finished basics.

c140a5c

Signed-off-by: SangBin Cho <rkooo567@gmail.com>

fix cpp error

b83af80

Signed-off-by: SangBin Cho <rkooo567@gmail.com>

working now.

509b311

Signed-off-by: SangBin Cho <rkooo567@gmail.com>

Merge branch 'master' into streaming-generator-1

d0795e5

fix a bug

f8a90f6

Signed-off-by: SangBin Cho <rkooo567@gmail.com>

Basic version finished.

0a9169d

Signed-off-by: SangBin Cho <rkooo567@gmail.com>

[Please Revert] Work e2e.

05f468a

Signed-off-by: SangBin Cho <rkooo567@gmail.com>

[Revert Please] Support core worker APIs and a generator.

122b705

Signed-off-by: SangBin Cho <rkooo567@gmail.com>

fix a bug

7a8fe2c

Signed-off-by: SangBin Cho <rkooo567@gmail.com>

Revert "[Revert Please] Support core worker APIs and a generator."

d880763

This reverts commit 122b705. Signed-off-by: SangBin Cho <rkooo567@gmail.com>

Revert "[Please Revert] Work e2e."

f501c22

This reverts commit 05f468a. Signed-off-by: SangBin Cho <rkooo567@gmail.com>

Merge branch 'master' into streaming-generator-1

1942394

Fix failing tests.

3e0212e

Signed-off-by: SangBin Cho <rkooo567@gmail.com>

Merge branch 'master' into streaming-generator-2

c9a932e

Merge branch 'streaming-generator-1' into streaming-generator-2

ffe20fd

Merge branch 'master' into streaming-generator-3

0e89ad7

Merge branch 'streaming-generator-2' into streaming-generator-3

d520e47

Fix

7610474

Signed-off-by: SangBin Cho <rkooo567@gmail.com>

Fix a broken test.

aaa0582

Signed-off-by: SangBin Cho <rkooo567@gmail.com>

Merge branch 'streaming-generator-1' into streaming-generator-2

a52f74b

Merge branch 'master' into streaming-generator-1

37c3bdd

Merge branch 'streaming-generator-1' into streaming-generator-2

fd83edd

Merge branch 'streaming-generator-2' into streaming-generator-3

ef08b64

Finished async actor.

74a2e31

Signed-off-by: SangBin Cho <rkooo567@gmail.com>

Add a unit test.

8b9ba39

Signed-off-by: SangBin Cho <rkooo567@gmail.com>

done

a4b62ac

Signed-off-by: SangBin Cho <rkooo567@gmail.com>

Merge branch 'master' into streaming-generator-1

d350b5d

Addressed code review.

9ed05d9

Signed-off-by: SangBin Cho <rkooo567@gmail.com>

rkooo567 added 5 commits June 22, 2023 09:29

Removed unnecessary logs

256d648

Signed-off-by: SangBin Cho <rkooo567@gmail.com>

Merge branch 'master' into streaming-generator-remove-busy-waiting

18a690c

Fixed a test failure.

12d73ac

Signed-off-by: SangBin Cho <rkooo567@gmail.com>

Merge branch 'streaming-generator-remove-busy-waiting' into streaming…

a0218f3

…-generator-wait

Merge branch 'master' into streaming-generator-wait

c58a9fa

Signed-off-by: SangBin Cho <rkooo567@gmail.com>

rkooo567 requested review from richardliaw, krfricke, xwjiang2010, amogkam, matthewdeng, Yard1, maxpumperla, a team, scv119, c21, scottjlee and bveeramani as code owners June 23, 2023 00:48

rkooo567 changed the base branch from streaming-generator-remove-busy-waiting to master June 23, 2023 00:48

Fix a test failure.

b26e86f

Signed-off-by: SangBin Cho <rkooo567@gmail.com>

rkooo567 merged commit 6a0c59e into ray-project:master Jun 23, 2023

akshay-anyscale mentioned this pull request Jul 21, 2023

Add service deployment instructions to stable diffusion template #37645

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Streaming Generator] Make it compatible with wait #36071

[Streaming Generator] Make it compatible with wait #36071

rkooo567 commented Jun 5, 2023 •

edited

Loading

[Streaming Generator] Make it compatible with wait #36071

[Streaming Generator] Make it compatible with wait #36071

Conversation

rkooo567 commented Jun 5, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

rkooo567 commented Jun 5, 2023 •

edited

Loading