-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make --fast robust against credential or wheel updates #4289
Conversation
This used to be true, but since skypilot-org#2943, 'ray' is the only provisioner. Add other keys that are now present instead.
This is needed since some files in the fake file mounts don't actually exist, like the wheel path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this @cg505! It looks mostly good to me. Left several comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @cg505! It looks mostly good to me! Left several comments, mostly nits.
Can we also add a smoke test for --fast
?
Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>
Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the updates @cg505! It looks mostly good to me.
Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cg505! It looks mostly good to me. Left some comments mostly for readability and code style.
We should make sure the backward compatibility tests pass. Also, can we add a quick smoke test to test this --fast
.
Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>
* [perf] use uv for venv creation and pip install (#4414) * Revert "remove `uv` from runtime setup due to azure installation issue (#4401)" This reverts commit 0b20d56. * on azure, use --prerelease=allow to install azure-cli * use uv venv --seed * fix backwards compatibility * really fix backwards compatibility * use uv to set up controller dependencies * fix python 3.8 * lint * add missing file * update comment * split out azure-cli dep * fix lint for dependencies * use runpy.run_path rather than modifying sys.path * fix cloud dependency installation commands * lint * Update sky/utils/controller_utils.py Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com> --------- Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com> * [Minor] README updates. (#4436) * [Minor] README touches. * update * update * make --fast robust against credential or wheel updates (#4289) * add config_dict['config_hash'] output to write_cluster_config * fix docstring for write_cluster_config This used to be true, but since #2943, 'ray' is the only provisioner. Add other keys that are now present instead. * when using --fast, check if config_hash matches, and if not, provision * mock hashing method in unit test This is needed since some files in the fake file mounts don't actually exist, like the wheel path. * check config hash within provision with lock held * address other PR review comments * rename to skip_if_no_cluster_updates Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com> * add assert details Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com> * address PR comments and update docstrings * fix test * update docstrings Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com> * address PR comments * fix lint and tests * Update sky/backends/cloud_vm_ray_backend.py Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com> * refactor skip_if_no_cluster_update var * clarify comment * format exception --------- Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com> * format * format * format * fix --------- Co-authored-by: Christopher Cooper <cooperc@assemblesys.com> Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
The --fast behavior is now always enabled. This was unsafe before but since \skypilot-org#4289 it should be safe. We will remove the flag before 0.8.0 so that it never touches a stable version. sky launch still has the --fast flag. This flag is unsafe because it could cause setup to be skipped even though it should be re-run. In the managed jobs case, this is not an issue because we fully control the setup and know it will not change.
* remove sky jobs launch --fast The --fast behavior is now always enabled. This was unsafe before but since \#4289 it should be safe. We will remove the flag before 0.8.0 so that it never touches a stable version. sky launch still has the --fast flag. This flag is unsafe because it could cause setup to be skipped even though it should be re-run. In the managed jobs case, this is not an issue because we fully control the setup and know it will not change. * fix lint
The result is that --fast will reprovision the cluster if some important things change (such as new cloud credentials or a new version of the skypilot wheel).
However, this does cause some performance regression in the --fast case since we need to go through the initial provisioning stages. That costs about 4s on my machine. I am looking into optimizing this - it's mostly an unnecessary roundtrip checking that the cluster is up.
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_minimal
pytest tests/test_smoke.py:: --managed-jobs
conda deactivate; bash -i tests/backward_compatibility_tests.sh