Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix hang when start sub pool fails #2468

Merged
merged 5 commits into from
Sep 18, 2021

Conversation

hekaisheng
Copy link
Contributor

What do these changes do?

Check the status of subprocess to avoid hang during the create pools.

Related issue number

Resolves #2464

@hekaisheng hekaisheng added type: bug Something isn't working mod: deploy labels Sep 17, 2021
@hekaisheng hekaisheng added this to the v0.8.0b1 milestone Sep 17, 2021
@hekaisheng hekaisheng force-pushed the bugfix/create-subpool branch from 8dea408 to 90880be Compare September 18, 2021 03:48
@hekaisheng hekaisheng added the to be backported Indicate that the PR need to be backported to stable branch label Sep 18, 2021
Copy link
Member

@wjsi wjsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, I left some comments.

mars/oscar/backends/mars/tests/test_pool.py Show resolved Hide resolved
mars/oscar/backends/mars/pool.py Outdated Show resolved Hide resolved
mars/oscar/backends/mars/pool.py Outdated Show resolved Hide resolved
mars/oscar/backends/mars/pool.py Outdated Show resolved Hide resolved
mars/oscar/backends/mars/pool.py Outdated Show resolved Hide resolved
mars/oscar/backends/test/pool.py Outdated Show resolved Hide resolved
@hekaisheng hekaisheng force-pushed the bugfix/create-subpool branch from 7c14adc to d529750 Compare September 18, 2021 08:53
Copy link
Member

@wjsi wjsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@qinxuye qinxuye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qinxuye qinxuye merged commit 3eb318c into mars-project:master Sep 18, 2021
pull bot pushed a commit to vishalbelsare/mars that referenced this pull request Sep 18, 2021
qinxuye pushed a commit to qinxuye/mars that referenced this pull request Sep 21, 2021
@qinxuye qinxuye added backported already PR has been backported and removed to be backported Indicate that the PR need to be backported to stable branch labels Oct 1, 2021
@hekaisheng hekaisheng deleted the bugfix/create-subpool branch January 18, 2022 10:27
chaokunyang added a commit to chaokunyang/mars that referenced this pull request May 31, 2022
Merge branch merge_github_2524 of git@gitlab.alipay-inc.com:ray-project/mars.git into master
https://code.alipay.com/ray-project/mars/pull_requests/58?tab=diff

Signed-off-by: 捕牛 <hejialing.hjl@antgroup.com>


* [Ray] Support reconstructing worker (mars-project#2413)


* Make cmdline support third party modules (mars-project#2454)

Co-authored-by: hanguang <zhusiyuan.zsy@alibaba-inc.com>
* Support visualizing subtask graphs on Mars Web (mars-project#2426)


* Fix timeout error when waiting for a submitted task (mars-project#2457)


* Print the error message when error happens in `TaskProcessor` (mars-project#2458)


* Add nightly builds for docker images (mars-project#2456)


* Fix misuse of `name` parameter in DataFrame align (mars-project#2469)


* Fix hang when start sub pool fails (mars-project#2468)


* Refine and unify subtask detail APIs (mars-project#2465)


* Fix coverage for Azure pipeline (mars-project#2474)


* Split tileable information and subtask graph into two tabs (mars-project#2480)


* Support specified vineyard socket and skip the launching vineyardd process (mars-project#2481)


* Basic reschedule subtask (mars-project#2467)


* Compatible with scikit-learn 1.0 (mars-project#2486)

Co-authored-by: hekaisheng <kaisheng.hks@alibaba-inc.com>
* Fix wrong translation in cluster deployment. (mars-project#2489)


* Fix bug that failed to execute query when there are multiple arguments (mars-project#2490)


* Include tileable property in detail api (mars-project#2493)


* Fix version of statsmodels to pass CI (mars-project#2497)


* Implements `glm.LogisticRegression` (mars-project#2466)


* Implements bagging sampling (mars-project#2496)


* Refine MarsDMatrix & support more parameters for XGB classifier and regressor (mars-project#2498)


* Fix output of df.groupby(as_index=False).size() (mars-project#2507)


* Add preliminary implementations for ufunc methods (mars-project#2510)


* Add doc for reading csv in oss (mars-project#2514)


* [Ray] Fix serializing lambdas in web (mars-project#2512)


* Add `make_regression` support for learn module (mars-project#2515)


* Fix reduction result on empty series (mars-project#2520)


* Fix df.loc when df is empty (mars-project#2524)


* fix start subpool

* fix test_kill_and_wait_timeout

* fix autoscale timeout

* fix ray larger clsuter fixture

* Update ci ray package to 1.2.2

* remove python3.6 3.8 .39 ut and upgrade ray 3.7 image

* echo python path

* fix json decode error

* fix bundle release timeout

* fix remove placement group timeout

* fix no_restart

* fix ci

* fix autoscale
chaokunyang added a commit to chaokunyang/mars that referenced this pull request May 31, 2022
Merge branch merge_github_2524 of git@gitlab.alipay-inc.com:ray-project/mars.git into master
https://code.alipay.com/ray-project/mars/pull_requests/58?tab=diff

Signed-off-by: 捕牛 <hejialing.hjl@antgroup.com>

* [Ray] Support reconstructing worker (mars-project#2413)

* Make cmdline support third party modules (mars-project#2454)

Co-authored-by: hanguang <zhusiyuan.zsy@alibaba-inc.com>
* Support visualizing subtask graphs on Mars Web (mars-project#2426)

* Fix timeout error when waiting for a submitted task (mars-project#2457)

* Print the error message when error happens in `TaskProcessor` (mars-project#2458)

* Add nightly builds for docker images (mars-project#2456)

* Fix misuse of `name` parameter in DataFrame align (mars-project#2469)

* Fix hang when start sub pool fails (mars-project#2468)

* Refine and unify subtask detail APIs (mars-project#2465)

* Fix coverage for Azure pipeline (mars-project#2474)

* Split tileable information and subtask graph into two tabs (mars-project#2480)

* Support specified vineyard socket and skip the launching vineyardd process (mars-project#2481)

* Basic reschedule subtask (mars-project#2467)

* Compatible with scikit-learn 1.0 (mars-project#2486)

Co-authored-by: hekaisheng <kaisheng.hks@alibaba-inc.com>
* Fix wrong translation in cluster deployment. (mars-project#2489)

* Fix bug that failed to execute query when there are multiple arguments (mars-project#2490)

* Include tileable property in detail api (mars-project#2493)

* Fix version of statsmodels to pass CI (mars-project#2497)

* Implements `glm.LogisticRegression` (mars-project#2466)

* Implements bagging sampling (mars-project#2496)

* Refine MarsDMatrix & support more parameters for XGB classifier and regressor (mars-project#2498)

* Fix output of df.groupby(as_index=False).size() (mars-project#2507)

* Add preliminary implementations for ufunc methods (mars-project#2510)

* Add doc for reading csv in oss (mars-project#2514)

* [Ray] Fix serializing lambdas in web (mars-project#2512)

* Add `make_regression` support for learn module (mars-project#2515)

* Fix reduction result on empty series (mars-project#2520)

* Fix df.loc when df is empty (mars-project#2524)

* fix start subpool

* fix test_kill_and_wait_timeout

* fix autoscale timeout

* fix ray larger clsuter fixture

* Update ci ray package to 1.2.2

* remove python3.6 3.8 .39 ut and upgrade ray 3.7 image

* echo python path

* fix json decode error

* fix bundle release timeout

* fix remove placement group timeout

* fix no_restart

* fix ci

* fix autoscale
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backported already PR has been backported mod: deploy type: bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Hang when creating mars worker
3 participants