Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[perf] use uv for venv creation and pip install #4414

Merged
merged 16 commits into from
Dec 3, 2024
Merged

Conversation

cg505
Copy link
Collaborator

@cg505 cg505 commented Nov 25, 2024

This was previously merged in as #4394, which was reverted in #4401 due to a bug.
This PR:

  • readds the old uv changes
  • fixes the bug on Azure (see below)
  • uses the --seed flag with uv venv to make sure pip is installed in the venv

Performance impact:

  • Cluster provisioning:
    • Using skypilot custom images:
      • AWS: saves about 0.5s (8.5s -> 8s)
      • Azure: saves about 5s (26s -> 21s)
    • Using stock ubuntu images ~3x speedup:
      • AWS: saves about 30s (45s -> 15s)
      • Azure: saves about 100s (130s -> 30s)
      • These results likely also apply to other user-provided images, including custom kubernetes images.
  • Job controller setup/cloud dependency installation saves about 45s (85s -> 25s) on first run
    • with GCP, AWS, and Azure enabled
    • on run 2+ with the controller already setup, saves about 3s (6.5s -> 3.5s)

Azure bug:
uv pip install skypilot[azure] would fail to install azure-cli and complain as follows:

  × No solution found when resolving dependencies:
  ╰─▶ Because there is no version of azure-keyvault-administration==4.4.0b2
      and azure-cli>=2.65.0 depends on azure-keyvault-administration==4.4.0b2,
      we can conclude that azure-cli>=2.65.0 cannot be used.
      And because only the following versions of azure-cli are available:
          azure-cli<=2.65.0
          azure-cli==2.66.0
          azure-cli==2.67.0
      and skypilot[azure]==1.0.0.dev0 depends on azure-cli>=2.65.0, we can
      conclude that skypilot[azure]==1.0.0.dev0 cannot be used.
      And because only skypilot[azure]==1.0.0.dev0 is available and you
      require skypilot[azure], we can conclude that your requirements are
      unsatisfiable.

      hint: azure-keyvault-administration was requested with a pre-release
      marker (e.g., azure-keyvault-administration==4.4.0b2), but pre-releases
      weren't enabled (try: `--prerelease=allow`)

This is because azure-cli directly depends on a pre-release version of azure-keyvault-administration (and several other packages). Normal pip handles this by allowing the pre-release version since it's specified as an explicit version requirement, however uv by default does not consider pre-releases at all. See https://docs.astral.sh/uv/pip/compatibility/#pre-release-compatibility.
We don't want to use uv pip install --prerelease=allow skypilot[azure] because this will allow pre-release versions to satisfy any dependency, not just these explicitly specified ones. I tested this and the resolved version was different for a few packages. This seems like an unacceptable difference, especially since we will be unnecessarily relying on pre-release packages.
Once azure-cli is installed, uv pip install skypilot[azure] will work just fine, since it doesn't need to worry about resolving azure-cli's dependencies. So, we just need to find a way to install azure-cli. pip install azure-cli works, but it's VERY slow (takes >60s on my machine). uv pip install --prerelease=allow azure-cli is much faster, so I went with this.
Technically, uv pip install --prerelease=allow azure-cli uses different resolution than pip install azure-cli, for the same reason described above for installing skypilot[azure]. However, I tested this, and in practice only one package has a different version. (azure-mgmt-search 9.1.0 -> 9.2.0b2)

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: conda deactivate; bash -i tests/backward_compatibility_tests.sh
    • k8s
    • aws
    • azure
      • technically I can't get this to pass - everything passes except step 7 which fails because it takes 50 minutes to schedule the job lol

@cg505
Copy link
Collaborator Author

cg505 commented Nov 25, 2024

@romilbhardwaj Rather than conditionally using pip/uv based on what's installed, I just changed the RAY_INSTALLATION_COMMANDS to also install uv if it's missing. I think this is better because we need different behavior between pip and uv, so trying to make things compatible with both is going to get increasingly difficult. As a brief example, pip uninstall needs the -y flag to work non-interactively, but this flag does not exist in uv pip uninstall.

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cg505! It looks mostly good to me. We should run the backward compatibility tests to make sure it works, especially on AWS, GCP, Azure and Kubernetes

Comment on lines 226 to 227
'if [ "{cloud}" = "azure" ]; then '
f'{SKY_UV_PIP_CMD} install --prerelease=allow "azure-cli>=2.65.0"; fi;'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should search for all occurence of azure-cli across the project and pip installs. Are we planning to install the dependencies with uv for controller dependencies as well?

'pip install "azure-cli>=2.31.0" azure-core '

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, I didn't know about this code. We should definitely fix this up. To be clear, everything should work as-is but ideally we would move this all to uv as well.

There are no other relevant mentions of azure-cli in the repo.

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cg505!

@@ -39,6 +39,7 @@
'which python3')
# Python executable, e.g., /opt/conda/bin/python3
SKY_PYTHON_CMD = f'$({SKY_GET_PYTHON_PATH_CMD})'
# Prefer SKY_UV_PIP_CMD, which is faster. TODO(cooper): remove all usages.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention the reason in the comment for why we keep this for future reference.

Comment on lines 38 to 39
sys.path.append(SETUP_FILE_DIR)
import dependencies
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dependencies seems not included in this PR?

@@ -25,6 +25,7 @@
from sky.jobs import utils as managed_job_utils
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome!

@cg505
Copy link
Collaborator Author

cg505 commented Nov 27, 2024

Ran backwards compatibility tests as shown in PR description.

@cg505 cg505 requested a review from Michaelvll November 27, 2024 19:11
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this @cg505! We may want to run some tests in a clean env: tests for normal launches and jobs/serve with new controller set up by this PR.

Comment on lines 36 to 39
# setuptools does not include the script dir on the search path, so manually add
# it so that we can import the dependencies file.
sys.path.append(SETUP_FILE_DIR)
import dependencies # pylint: disable=wrong-import-position
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a common practice?

Would it cause any issue with our wheel building during sky launch? Would be great if we can test it in a clean env, e.g. start a new container for running some tests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the wheel build command manually as well as sky launch to make sure this would work. I looked for information on whether this is common practice but couldn't find any useful info.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this: pypa/setuptools#3134
Obviously it's not ideal to hack sys.path. We could use importlib as suggested there. What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python build systems are so complicated...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this to use runpy.run_path, which seems a lot cleaner.

Comment on lines 210 to 212
f'uv -V > /dev/null 2>&1 ||'
'curl -LsSf https://astral.sh/uv/install.sh 2>/dev/null |'
'UV_INSTALL_DIR="$HOME/.local/bin" sh >/dev/null 2>&1')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, should we just use uv setup in our provision setup, i.e. use {constants.UV_PIP_CMD} directly?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we use {{ sky_activate_python_env }} in jobs controller yml to make sure pip install goes to the separate env we created. Will the new changes cause any issue?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably could use SKY_UV_PIP_CMD - I didn't because the existing code wasn't using SKY_PIP_CMD.
Activating the python env normally should work correctly with uv.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use the commands from skylet.constants.

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cg505! It looks mostly good to me with a small comment below

sky/utils/controller_utils.py Show resolved Hide resolved
@cg505 cg505 requested a review from Michaelvll December 3, 2024 20:05
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cg505! LGTM.

Can we have a quick number for the performance improvement in the PR description, for future reference?

sky/utils/controller_utils.py Outdated Show resolved Hide resolved
Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>
@cg505
Copy link
Collaborator Author

cg505 commented Dec 3, 2024

Can we have a quick number for the performance improvement in the PR description, for future reference?

@Michaelvll updated PR description!

@cg505 cg505 merged commit de561c2 into skypilot-org:master Dec 3, 2024
19 checks passed
Michaelvll added a commit that referenced this pull request Dec 9, 2024
* [perf] use uv for venv creation and pip install (#4414)

* Revert "remove `uv` from runtime setup due to azure installation issue (#4401)"

This reverts commit 0b20d56.

* on azure, use --prerelease=allow to install azure-cli

* use uv venv --seed

* fix backwards compatibility

* really fix backwards compatibility

* use uv to set up controller dependencies

* fix python 3.8

* lint

* add missing file

* update comment

* split out azure-cli dep

* fix lint for dependencies

* use runpy.run_path rather than modifying sys.path

* fix cloud dependency installation commands

* lint

* Update sky/utils/controller_utils.py

Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>

---------

Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>

* [Minor] README updates. (#4436)

* [Minor] README touches.

* update

* update

* make --fast robust against credential or wheel updates (#4289)

* add config_dict['config_hash'] output to write_cluster_config

* fix docstring for write_cluster_config

This used to be true, but since #2943, 'ray' is the only provisioner.
Add other keys that are now present instead.

* when using --fast, check if config_hash matches, and if not, provision

* mock hashing method in unit test

This is needed since some files in the fake file mounts don't actually exist,
like the wheel path.

* check config hash within provision with lock held

* address other PR review comments

* rename to skip_if_no_cluster_updates

Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>

* add assert details

Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>

* address PR comments and update docstrings

* fix test

* update docstrings

Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>

* address PR comments

* fix lint and tests

* Update sky/backends/cloud_vm_ray_backend.py

Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>

* refactor skip_if_no_cluster_update var

* clarify comment

* format exception

---------

Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>

* format

* format

* format

* fix

---------

Co-authored-by: Christopher Cooper <cooperc@assemblesys.com>
Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
Michaelvll added a commit that referenced this pull request Dec 9, 2024
* [perf] use uv for venv creation and pip install (#4414)

* Revert "remove `uv` from runtime setup due to azure installation issue (#4401)"

This reverts commit 0b20d56.

* on azure, use --prerelease=allow to install azure-cli

* use uv venv --seed

* fix backwards compatibility

* really fix backwards compatibility

* use uv to set up controller dependencies

* fix python 3.8

* lint

* add missing file

* update comment

* split out azure-cli dep

* fix lint for dependencies

* use runpy.run_path rather than modifying sys.path

* fix cloud dependency installation commands

* lint

* Update sky/utils/controller_utils.py

Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>

---------

Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>

* [Minor] README updates. (#4436)

* [Minor] README touches.

* update

* update

* basic impl

* fixes

* start dashboard when restarting the controller

* Fix dashboard support

* Fix

* format

* format

* format

* format

* format

* address comments

* format

* fix

---------

Co-authored-by: Christopher Cooper <cooperc@assemblesys.com>
Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants