how to recreate cluster? #120

erthward · 2022-10-20T18:52:51Z

erthward
Oct 20, 2022

Hi, All!

This is probably a dumb question, but I don't understand enough about what's happening behind the scenes in order to tell how or why it's dumb... which I guess makes it non-dumb?

I am trying to prototype a hopeful analysis, but am currently just stuck fiddling to figure out how to the optimize pipeline structure and the dask cluster memory and other configurations. As part of this process, I've found that I run a trial, watch the dask dashboard, watch workers fail, and then need to revisit my setup (e.g., increase the RAM per worker). When I do this, then launch and connect to a new cluster and rerun the code I'm stuck, I typically find that the dashboard never populates with workers and work never starts to happen. I imagine it's just hanging, waiting for resources to be provisioned. I thought I was fine iterating this way, as long as I was calling cluster.close() each time the task fails, but I appear to be wrong. What would be the 'right way' to do this? Is there a call I'm missing? Or should I just be resizing the existing cluster? I've tried searching online but am coming up empty-handed...

Thanks for any insight!

Answered by TomAugspurger

Oct 20, 2022

cluster.close() should be releasing the resources, so I think you're doing things correctly. You can check gateway.list_cluster() to see if you have any other clusters. It's possible that there's some delay between calling cluster.close() and the resources actually being freed by Kubernetes (and you don't have access to the Kubernetes API, so you don't have any direct visibility into when the resources are freed).

View full answer

TomAugspurger · 2022-10-20T21:25:44Z

TomAugspurger
Oct 20, 2022

cluster.close() should be releasing the resources, so I think you're doing things correctly. You can check gateway.list_cluster() to see if you have any other clusters. It's possible that there's some delay between calling cluster.close() and the resources actually being freed by Kubernetes (and you don't have access to the Kubernetes API, so you don't have any direct visibility into when the resources are freed).

1 reply

erthward Oct 21, 2022
Author

Okay, perfect. Thanks! This is great to know. Sounds like I just need to think more carefully about memory requirements beforehand and be more patient if and when I need to start a new cluster. Sounds like gateway.list_cluster() will be helpful too.

Thanks, Tom!

gjoseph92 · 2022-10-30T19:41:14Z

gjoseph92
Oct 30, 2022

@erthward if you're having memory issues with Dask, you might also want to try this: dask/distributed#7128

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to recreate cluster? #120

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

how to recreate cluster? #120

erthward Oct 20, 2022

Replies: 2 comments · 1 reply

TomAugspurger Oct 20, 2022

erthward Oct 21, 2022 Author

gjoseph92 Oct 30, 2022

erthward
Oct 20, 2022

Replies: 2 comments 1 reply

TomAugspurger
Oct 20, 2022

erthward Oct 21, 2022
Author

gjoseph92
Oct 30, 2022