Dask Cluster on PC hub shutting down #135
saadsidmsft
started this conversation in
General
Replies: 1 comment
-
Do you have a minimal example you can share that reproduces the issue? You might also try seeing if the exception reproduces with a
Just a quick clarification here: dask / dask-gateway rely on the resource provider to capture stderr / stdout. If you deploy your own hub you can configure things to pipe logs to an Azure Log Analytics Workspace. So in theory the logs could be obtained, but in this case you can't since you don't have access to AKS / the azure resources for the PC Hub. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi Team,
We are spinning up dask clusters to run some of our models and data pipelines. However we are constantly running into issues of worker nodes being killed.
Here an example of error:
KilledWorker: ('finalize-f1106095-3dbe-451b-93b6-d0dd143342a0', <WorkerState 'tls://10.244.42.99:35501', name: dask-worker-841f6e3f003f44999b37567979abc6f9-trzwk, status: closed, memory: 0, processing: 125>)
We are well within the limits of cluster though, Here are out cluster configuration
Cores per worker=2 (limit 8)
Memory per worker = 8 GiB (64 GiB limit)
Number of workers = 100 (400 limit)
The code is running fine on normal python environment (4 cores, 32 GiB memory). Can someone please help in this regard as the Dask cluster doesn't provide a lot of logging info?
Thank you
Regards
Saad Siddique
Beta Was this translation helpful? Give feedback.
All reactions