-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ray] Support basic subtask retry and lineage reconstruction #2969
[Ray] Support basic subtask retry and lineage reconstruction #2969
Conversation
Tests for ray task retry need to be added. |
|
Wait #2968 to get merged, then move part of scheduling config to executor |
3e073d9
to
3163a46
Compare
3163a46
to
ded20a3
Compare
ded20a3
to
028d965
Compare
02fa85b
to
7f1da23
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments.
54749fd
to
55ef24e
Compare
Yes, there are some configurations like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…oject#2969) (cherry picked from commit 61c0c51)
What do these changes do?
This PR implements basic subtask retry and reconstruction based on ray object lineage reconstruction.
subtask_retry_times
in scheduling config for ray task retry config.max_retries
in ray task to enable subtask retry and reconstruction.With this PR, if task lineage are not evicted, the mars tasks will succeed even some ray nodes has failed.
Related issue number
Closes #3029
#2972
Check code requirements