[Ray] Ray execution state #3002

fyrestone · 2022-05-06T02:47:21Z

What do these changes do?

Mars execution context provides remote object to sync states between multiple running operands. Ray execution backends use a task state actor for this feature.

With this PR, ray backend supports incremental index.

Move the logic of initializing context (the ThreadedServiceContext used in the supervisor) to the execution backend.
Add a task state actor to Ray execution backend.
Ray execution context supports remote object.
Not to fetch the chunk meta when tiling HeadOptimizedDataSource.

The drawback is that,

Poor performance to sync states among operands, maybe we can avoid using remote object when executing operands in the future.
Hard to fault tolerance because the remote object makes the operands stateful.

Related issue number

#2893

Check code requirements

tests added / passed (if needed)
Ensure all linting tests pass, see here for how to run them

chaokunyang · 2022-05-06T06:39:04Z

Will this ray actor influence the lineage reconstruction in #2972:

If this actor died, will lineage reconstruction still succeed?
If this actor restarted by ray, how the state auto recovered?
If the lineage reconstruction call this actor multiple times, how the idempotence be ensured?

fyrestone · 2022-05-06T07:08:58Z

Will this ray actor influence the lineage reconstruction in #2972:

If this actor died, will lineage reconstruction still succeed?

If this actor restarted by ray, how the state auto recovered?

If the lineage reconstruction call this actor multiple times, how the idempotence be ensured?

Currently, some operands use a remote object to sync states. If the state actor is reconstructed, the simplest way to recover the compute is,

If call remote object can't find the remote object and the state actor is reconstructed, then raises a RecoveryFailed exception.
The fault recovery logic catch this exception to recompute the predecessors.

We should make the operands stateless to avoid above complex recovery.

This PR is to maximize compatibility with existing Mars execution logic, the fault recovery is not included in this PR.

qinxuye

LGTM

qinxuye · 2022-05-07T07:05:03Z

@chaokunyang could you please review this PR?

zhongchun

LGTM. I left two comments where i am a little confused.

mars/services/task/execution/ray/context.py

mars/dataframe/datasource/core.py

zhongchun

LGTM

刘宝 added 4 commits May 6, 2022 10:37

Ray execution state

784fbd6

Fix stop pool

3a101f9

Not to fetch chunk meta when tiling HeadOptimizedDataSource

5729961

Fix

600ff04

fyrestone self-assigned this May 6, 2022

fyrestone added the mod: ray integration label May 6, 2022

qinxuye added type: enhancement request mod: task service labels May 6, 2022

Fix

96efb6c

刘宝 added 4 commits May 6, 2022 18:03

Use named actor for Ray task state

d62ad34

Improve coverage

cff2da9

Fix lint

21749fb

Merge remote-tracking branch 'upstream/master' into ray_execution_state

abb4889

fyrestone marked this pull request as ready for review May 7, 2022 01:34

fyrestone requested review from wjsi, qinxuye and hekaisheng as code owners May 7, 2022 01:34

qinxuye approved these changes May 7, 2022

View reviewed changes

qinxuye added this to the v0.9.0rc3 milestone May 7, 2022

fyrestone requested a review from zhongchun May 7, 2022 09:30

zhongchun reviewed May 7, 2022

View reviewed changes

mars/services/task/execution/ray/context.py Show resolved Hide resolved

mars/dataframe/datasource/core.py Show resolved Hide resolved

zhongchun approved these changes May 7, 2022

View reviewed changes

fyrestone merged commit 03ed810 into mars-project:master May 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ray] Ray execution state #3002

[Ray] Ray execution state #3002

fyrestone commented May 6, 2022 •

edited

Loading

chaokunyang commented May 6, 2022

fyrestone commented May 6, 2022

qinxuye left a comment

qinxuye commented May 7, 2022

zhongchun left a comment

zhongchun left a comment

[Ray] Ray execution state #3002

[Ray] Ray execution state #3002

Conversation

fyrestone commented May 6, 2022 • edited Loading

What do these changes do?

Related issue number

Check code requirements

chaokunyang commented May 6, 2022

fyrestone commented May 6, 2022

qinxuye left a comment

Choose a reason for hiding this comment

qinxuye commented May 7, 2022

zhongchun left a comment

Choose a reason for hiding this comment

zhongchun left a comment

Choose a reason for hiding this comment

fyrestone commented May 6, 2022 •

edited

Loading