-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix hang when start sub pool fails #2468
Fix hang when start sub pool fails #2468
Conversation
8dea408
to
90880be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM, I left some comments.
7c14adc
to
d529750
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
(cherry picked from commit 5cdb251)
Merge branch merge_github_2524 of git@gitlab.alipay-inc.com:ray-project/mars.git into master https://code.alipay.com/ray-project/mars/pull_requests/58?tab=diff Signed-off-by: 捕牛 <hejialing.hjl@antgroup.com> * [Ray] Support reconstructing worker (mars-project#2413) * Make cmdline support third party modules (mars-project#2454) Co-authored-by: hanguang <zhusiyuan.zsy@alibaba-inc.com> * Support visualizing subtask graphs on Mars Web (mars-project#2426) * Fix timeout error when waiting for a submitted task (mars-project#2457) * Print the error message when error happens in `TaskProcessor` (mars-project#2458) * Add nightly builds for docker images (mars-project#2456) * Fix misuse of `name` parameter in DataFrame align (mars-project#2469) * Fix hang when start sub pool fails (mars-project#2468) * Refine and unify subtask detail APIs (mars-project#2465) * Fix coverage for Azure pipeline (mars-project#2474) * Split tileable information and subtask graph into two tabs (mars-project#2480) * Support specified vineyard socket and skip the launching vineyardd process (mars-project#2481) * Basic reschedule subtask (mars-project#2467) * Compatible with scikit-learn 1.0 (mars-project#2486) Co-authored-by: hekaisheng <kaisheng.hks@alibaba-inc.com> * Fix wrong translation in cluster deployment. (mars-project#2489) * Fix bug that failed to execute query when there are multiple arguments (mars-project#2490) * Include tileable property in detail api (mars-project#2493) * Fix version of statsmodels to pass CI (mars-project#2497) * Implements `glm.LogisticRegression` (mars-project#2466) * Implements bagging sampling (mars-project#2496) * Refine MarsDMatrix & support more parameters for XGB classifier and regressor (mars-project#2498) * Fix output of df.groupby(as_index=False).size() (mars-project#2507) * Add preliminary implementations for ufunc methods (mars-project#2510) * Add doc for reading csv in oss (mars-project#2514) * [Ray] Fix serializing lambdas in web (mars-project#2512) * Add `make_regression` support for learn module (mars-project#2515) * Fix reduction result on empty series (mars-project#2520) * Fix df.loc when df is empty (mars-project#2524) * fix start subpool * fix test_kill_and_wait_timeout * fix autoscale timeout * fix ray larger clsuter fixture * Update ci ray package to 1.2.2 * remove python3.6 3.8 .39 ut and upgrade ray 3.7 image * echo python path * fix json decode error * fix bundle release timeout * fix remove placement group timeout * fix no_restart * fix ci * fix autoscale
Merge branch merge_github_2524 of git@gitlab.alipay-inc.com:ray-project/mars.git into master https://code.alipay.com/ray-project/mars/pull_requests/58?tab=diff Signed-off-by: 捕牛 <hejialing.hjl@antgroup.com> * [Ray] Support reconstructing worker (mars-project#2413) * Make cmdline support third party modules (mars-project#2454) Co-authored-by: hanguang <zhusiyuan.zsy@alibaba-inc.com> * Support visualizing subtask graphs on Mars Web (mars-project#2426) * Fix timeout error when waiting for a submitted task (mars-project#2457) * Print the error message when error happens in `TaskProcessor` (mars-project#2458) * Add nightly builds for docker images (mars-project#2456) * Fix misuse of `name` parameter in DataFrame align (mars-project#2469) * Fix hang when start sub pool fails (mars-project#2468) * Refine and unify subtask detail APIs (mars-project#2465) * Fix coverage for Azure pipeline (mars-project#2474) * Split tileable information and subtask graph into two tabs (mars-project#2480) * Support specified vineyard socket and skip the launching vineyardd process (mars-project#2481) * Basic reschedule subtask (mars-project#2467) * Compatible with scikit-learn 1.0 (mars-project#2486) Co-authored-by: hekaisheng <kaisheng.hks@alibaba-inc.com> * Fix wrong translation in cluster deployment. (mars-project#2489) * Fix bug that failed to execute query when there are multiple arguments (mars-project#2490) * Include tileable property in detail api (mars-project#2493) * Fix version of statsmodels to pass CI (mars-project#2497) * Implements `glm.LogisticRegression` (mars-project#2466) * Implements bagging sampling (mars-project#2496) * Refine MarsDMatrix & support more parameters for XGB classifier and regressor (mars-project#2498) * Fix output of df.groupby(as_index=False).size() (mars-project#2507) * Add preliminary implementations for ufunc methods (mars-project#2510) * Add doc for reading csv in oss (mars-project#2514) * [Ray] Fix serializing lambdas in web (mars-project#2512) * Add `make_regression` support for learn module (mars-project#2515) * Fix reduction result on empty series (mars-project#2520) * Fix df.loc when df is empty (mars-project#2524) * fix start subpool * fix test_kill_and_wait_timeout * fix autoscale timeout * fix ray larger clsuter fixture * Update ci ray package to 1.2.2 * remove python3.6 3.8 .39 ut and upgrade ray 3.7 image * echo python path * fix json decode error * fix bundle release timeout * fix remove placement group timeout * fix no_restart * fix ci * fix autoscale
What do these changes do?
Check the status of subprocess to avoid hang during the create pools.
Related issue number
Resolves #2464