Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Pending tasks gets hanging instead of resubmitted #44578

Closed
hongchaodeng opened this issue Apr 8, 2024 · 0 comments
Closed

[Core] Pending tasks gets hanging instead of resubmitted #44578

hongchaodeng opened this issue Apr 8, 2024 · 0 comments
Assignees
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core P0 Issues that should be fixed in short order

Comments

@hongchaodeng
Copy link
Member

hongchaodeng commented Apr 8, 2024

What happened + What you expected to happen

The following logs show that task 47544f75c34aab0 was resubmitted but never gets into RUNNING state.

37620:[2024-04-05 23:48:18,879 W 904 1135] task_manager.cc:1105: Task attempt 47544f75c34aab0fffffffffffffffffffffffff02000000 failed with error NODE_DIED Fail immediately? 0, status GrpcUnavailable: RPC Error message: Socket closed; RPC Error details: , error info error_message: "Task failed due to the node dying.\n\n...
37623:[2024-04-05 23:48:18,879 I 904 1135] task_manager.cc:998: task 47544f75c34aab0fffffffffffffffffffffffff02000000 retries left: 10, oom retries left: -1, task failed due to oom: 0
37624:[2024-04-05 23:48:18,879 I 904 1135] task_manager.cc:1002: Attempting to resubmit task 47544f75c34aab0fffffffffffffffffffffffff02000000 for attempt number: 0
37625:[2024-04-05 23:48:18,880 I 904 1135] core_worker.cc:411: Will resubmit task after a 0ms delay: ...

Versions / Dependencies

Ray 2.10.0

Reproduction script

none

Issue Severity

High: It blocks me from completing my task.

@hongchaodeng hongchaodeng added bug Something that is supposed to be working; but isn't P0 Issues that should be fixed in short order triage Needs triage (eg: priority, bug/not-bug, and owning component) and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Apr 8, 2024
@jjyao jjyao added the core Issues that should be addressed in Ray Core label Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core P0 Issues that should be fixed in short order
Projects
None yet
Development

No branches or pull requests

2 participants