Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GCP] GCE DWS Support #3574

Merged
merged 28 commits into from
Jun 11, 2024
Merged

[GCP] GCE DWS Support #3574

merged 28 commits into from
Jun 11, 2024

Conversation

gurcangercek
Copy link
Collaborator

To handoff the related DWS support work

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: conda deactivate; bash -i tests/backward_compatibility_tests.sh

@Michaelvll Michaelvll self-requested a review May 21, 2024 20:14
@gurcangercek gurcangercek marked this pull request as ready for review June 4, 2024 22:17
@gurcangercek gurcangercek marked this pull request as draft June 4, 2024 22:18
@Michaelvll
Copy link
Collaborator

Michaelvll commented Jun 6, 2024

Thanks a lot for the PR @gurcangercek! This is fantastic!
We made some modifications to this PR to polish it a bit and fit it into our provisioner logic. It seems working nicely with single and multiple VMs. Could you help check and merge the following into this branch?
Shopify#4

@gurcangercek gurcangercek changed the title [GCP] WIP - GCE DWS Support [GCP] GCE DWS Support Jun 6, 2024
@gurcangercek gurcangercek marked this pull request as ready for review June 6, 2024 16:44
…uest cancellation (#5)

* Fix config fields

* fix cancel

* Add loggings

* remove useless codes
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for adding the support of DWS with GCE @gurcangercek! This is awesome. This PR should be ready to go once we have the master branch merged. : )

@gurcangercek
Copy link
Collaborator Author

Thanks a lot for adding the support of DWS with GCE @gurcangercek! This is awesome. This PR should be ready to go once we have the master branch merged. : )

Hey @Michaelvll , thanks for the update and sorry for the delay. It is all yours...

@Michaelvll Michaelvll mentioned this pull request Jun 11, 2024
6 tasks
@Michaelvll
Copy link
Collaborator

Michaelvll commented Jun 11, 2024

Tested:

@Michaelvll Michaelvll merged commit e6ee397 into skypilot-org:master Jun 11, 2024
20 checks passed
@binarycrayon
Copy link

binarycrayon commented Aug 21, 2024

hello, can this be used in conjunct with the new skypilot services?

@Michaelvll
Copy link
Collaborator

Michaelvll commented Aug 21, 2024

Hey @binarycrayon, yes, it should be possible to use DWS for your service if you specify the following section in your task:

experimental:
  config_overrides:
    gcp:
      managed_instance_group:
        run_duration: 3600

For more details: https://skypilot.readthedocs.io/en/latest/reference/yaml-spec.html#experimental
https://skypilot.readthedocs.io/en/latest/reference/config.html

@binarycrayon
Copy link

thanks so much! I'd assume service yaml also support this https://skypilot.readthedocs.io/en/latest/serving/service-yaml-spec.html

@Michaelvll
Copy link
Collaborator

thanks so much! I'd assume service yaml also support this https://skypilot.readthedocs.io/en/latest/serving/service-yaml-spec.html

Yes, the service YAML shares all the specs for task

Michaelvll added a commit that referenced this pull request Aug 23, 2024
* [GCP] initial take for dws support with migs

* fix lint errors

* dependency and format fix

* refactor mig instance creation

* fix

* remove unecessary instance creation code for mig

* Fix deletion

* Fix instance template logic

* Restart

* format

* format

* move to REST APIs instead of python APIs

* add multi-node back

* Fix multi-node

* Avoid spot

* format

* format

* fix scheduling

* fix cancel

* Add smoke test

* revert some changes

* fix smoke

* Fix

* fix

* Fix smoke

* [GCP] Changing the config name for DWS support and fix for resize request cancellation (#5)

* Fix config fields

* fix cancel

* Add loggings

* remove useless codes

---------

Co-authored-by: Zhanghao Wu <zhangaho.wu@outlook.com>
Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>
@binarycrayon
Copy link

Since it is possible to specify spot in instance template, can we enable use of spot instance with managed instance group?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants