-
Notifications
You must be signed in to change notification settings - Fork 549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Avoid terminating cluster for resources unavailability #2170
Merged
Michaelvll
merged 13 commits into
master
from
avoid-terminate-cluster-for-resources-unavailability
Jul 3, 2023
Merged
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
ff7272e
fix key interruption
Michaelvll 460ceaf
fix
Michaelvll 1d10976
format
Michaelvll c485ffe
fix spot
Michaelvll 98bb90a
fix comment
Michaelvll 148bad0
Address comments
Michaelvll 318c114
Update sky/backends/cloud_vm_ray_backend.py
Michaelvll fd09434
revert log lib
Michaelvll 1194bbd
revert log_lib
Michaelvll a8d2840
Merge branch 'master' of github.com:skypilot-org/skypilot into avoid-…
Michaelvll 7908c32
Update sky/backends/backend_utils.py
Michaelvll b1f638e
Merge branch 'avoid-terminate-cluster-for-resources-unavailability' o…
Michaelvll 949c4fb
better logging
Michaelvll File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if we do these two steps for a new cluster name? I imagine with this PR, at step 2 we should not set it to STOPPED and we should do the provisioning loop as usual.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We won't set it to stop for a new cluster, because the new cluster will only have the following two cases:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This repro gave
Maybe we should change L3788's logging to (or something more clear):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Updated the logging. Tested again with:
sky launch -c min --cloud gcp --cpus 2
; manually terminate the cluster on the console;python -c 'import sky; sky.launch(sky.Task(), cluster_name="min")'
again