-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic reschedule subtask #2467
Merged
qinxuye
merged 10 commits into
mars-project:master
from
fyrestone:basic_reschedule_subtask
Sep 24, 2021
Merged
Basic reschedule subtask #2467
qinxuye
merged 10 commits into
mars-project:master
from
fyrestone:basic_reschedule_subtask
Sep 24, 2021
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
qinxuye
added
mod: task service
type: feature
New feature
mod: scheduling service
labels
Sep 17, 2021
fyrestone
requested review from
hekaisheng,
qinxuye and
wjsi
as code owners
September 23, 2021 09:19
qinxuye
approved these changes
Sep 24, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
chaokunyang
added a commit
to chaokunyang/mars
that referenced
this pull request
May 31, 2022
Merge branch merge_github_2524 of git@gitlab.alipay-inc.com:ray-project/mars.git into master https://code.alipay.com/ray-project/mars/pull_requests/58?tab=diff Signed-off-by: 捕牛 <hejialing.hjl@antgroup.com> * [Ray] Support reconstructing worker (mars-project#2413) * Make cmdline support third party modules (mars-project#2454) Co-authored-by: hanguang <zhusiyuan.zsy@alibaba-inc.com> * Support visualizing subtask graphs on Mars Web (mars-project#2426) * Fix timeout error when waiting for a submitted task (mars-project#2457) * Print the error message when error happens in `TaskProcessor` (mars-project#2458) * Add nightly builds for docker images (mars-project#2456) * Fix misuse of `name` parameter in DataFrame align (mars-project#2469) * Fix hang when start sub pool fails (mars-project#2468) * Refine and unify subtask detail APIs (mars-project#2465) * Fix coverage for Azure pipeline (mars-project#2474) * Split tileable information and subtask graph into two tabs (mars-project#2480) * Support specified vineyard socket and skip the launching vineyardd process (mars-project#2481) * Basic reschedule subtask (mars-project#2467) * Compatible with scikit-learn 1.0 (mars-project#2486) Co-authored-by: hekaisheng <kaisheng.hks@alibaba-inc.com> * Fix wrong translation in cluster deployment. (mars-project#2489) * Fix bug that failed to execute query when there are multiple arguments (mars-project#2490) * Include tileable property in detail api (mars-project#2493) * Fix version of statsmodels to pass CI (mars-project#2497) * Implements `glm.LogisticRegression` (mars-project#2466) * Implements bagging sampling (mars-project#2496) * Refine MarsDMatrix & support more parameters for XGB classifier and regressor (mars-project#2498) * Fix output of df.groupby(as_index=False).size() (mars-project#2507) * Add preliminary implementations for ufunc methods (mars-project#2510) * Add doc for reading csv in oss (mars-project#2514) * [Ray] Fix serializing lambdas in web (mars-project#2512) * Add `make_regression` support for learn module (mars-project#2515) * Fix reduction result on empty series (mars-project#2520) * Fix df.loc when df is empty (mars-project#2524) * fix start subpool * fix test_kill_and_wait_timeout * fix autoscale timeout * fix ray larger clsuter fixture * Update ci ray package to 1.2.2 * remove python3.6 3.8 .39 ut and upgrade ray 3.7 image * echo python path * fix json decode error * fix bundle release timeout * fix remove placement group timeout * fix no_restart * fix ci * fix autoscale
chaokunyang
added a commit
to chaokunyang/mars
that referenced
this pull request
May 31, 2022
Merge branch merge_github_2524 of git@gitlab.alipay-inc.com:ray-project/mars.git into master https://code.alipay.com/ray-project/mars/pull_requests/58?tab=diff Signed-off-by: 捕牛 <hejialing.hjl@antgroup.com> * [Ray] Support reconstructing worker (mars-project#2413) * Make cmdline support third party modules (mars-project#2454) Co-authored-by: hanguang <zhusiyuan.zsy@alibaba-inc.com> * Support visualizing subtask graphs on Mars Web (mars-project#2426) * Fix timeout error when waiting for a submitted task (mars-project#2457) * Print the error message when error happens in `TaskProcessor` (mars-project#2458) * Add nightly builds for docker images (mars-project#2456) * Fix misuse of `name` parameter in DataFrame align (mars-project#2469) * Fix hang when start sub pool fails (mars-project#2468) * Refine and unify subtask detail APIs (mars-project#2465) * Fix coverage for Azure pipeline (mars-project#2474) * Split tileable information and subtask graph into two tabs (mars-project#2480) * Support specified vineyard socket and skip the launching vineyardd process (mars-project#2481) * Basic reschedule subtask (mars-project#2467) * Compatible with scikit-learn 1.0 (mars-project#2486) Co-authored-by: hekaisheng <kaisheng.hks@alibaba-inc.com> * Fix wrong translation in cluster deployment. (mars-project#2489) * Fix bug that failed to execute query when there are multiple arguments (mars-project#2490) * Include tileable property in detail api (mars-project#2493) * Fix version of statsmodels to pass CI (mars-project#2497) * Implements `glm.LogisticRegression` (mars-project#2466) * Implements bagging sampling (mars-project#2496) * Refine MarsDMatrix & support more parameters for XGB classifier and regressor (mars-project#2498) * Fix output of df.groupby(as_index=False).size() (mars-project#2507) * Add preliminary implementations for ufunc methods (mars-project#2510) * Add doc for reading csv in oss (mars-project#2514) * [Ray] Fix serializing lambdas in web (mars-project#2512) * Add `make_regression` support for learn module (mars-project#2515) * Fix reduction result on empty series (mars-project#2520) * Fix df.loc when df is empty (mars-project#2524) * fix start subpool * fix test_kill_and_wait_timeout * fix autoscale timeout * fix ray larger clsuter fixture * Update ci ray package to 1.2.2 * remove python3.6 3.8 .39 ut and upgrade ray 3.7 image * echo python path * fix json decode error * fix bundle release timeout * fix remove placement group timeout * fix no_restart * fix ci * fix autoscale
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What do these changes do?
SubtaskManagerActor
.SubtaskExecutionActor
callsTaskAPI.set_subtask_result
to set the subtask result.SubtaskManagerActor
handlesrun_subtask
exceptions.SubtaskManagerActor
gets and sets the subtask result, also handles therun_subtask
exceptions.SubtaskExecutionActor
releases global slots.SubtaskManagerActor
releases global slots.subtask_max_reschedules
to reschedule failed subtask. (This PR can't handle worker main pool crash)Related issue number
N/A