Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

directorv2 fails to reconnect to new scheduler and remains "waiting for cluster" #5237

Closed
1 task done
Tracked by #950
pcrespov opened this issue Jan 15, 2024 · 0 comments · Fixed by #5252
Closed
1 task done
Tracked by #950

directorv2 fails to reconnect to new scheduler and remains "waiting for cluster" #5237

pcrespov opened this issue Jan 15, 2024 · 0 comments · Fixed by #5252
Assignees
Labels
bug buggy, it does not work as expected

Comments

@pcrespov
Copy link
Member

Is there an existing issue for this?

  • I have searched the existing issues

Which deploy/s?

development (master)

Current Behavior

When dask-scheduler restarts and a computational service is run, the directorv2 notices and raises https://monitoring.osparc-master.speag.com/graylog/messages/graylog_115/dd0866a5-b38d-11ee-baa3-0242ac130007

The Tasks are set back to WAITING_FOR_CLUSTER state until scheduler comes back but the scheduler somehow does not "come back" and hangs forever

Expected Behavior

After some time, dv2 should resolve that there is a new scheduler in place. I guess it needs to resync pending tasks list?

Steps To Reproduce

  1. open sleepers project
  2. in portainer restart dask scheduler
  3. open graylog and set filter dv2
  4. run slepper project

Anything else?

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug buggy, it does not work as expected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants