Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] [1/N] Validate uv options #48479

Merged
merged 9 commits into from
Nov 5, 2024

Conversation

dentiny
Copy link
Contributor

@dentiny dentiny commented Oct 31, 2024

This PR is the 1st PR to add uv support for package installation.

As discussed at #47819 (comment), uv would be added as another key for runtime env.

Example usage:

runtime_env = {"uv": {"uv_version": "==0.1.1", "packages":["tensorflow", "requests"]}}

Most of the pip features will be supported at the end of the day.
This PR implements the runtime env validation part.

Followup TODO items:

  • Download latest version of uv and install in the system
  • Implement requirement list based packages
  • Allow users to specify uv version, instead of latest one

This PR also includes a real unit test, relying on bazel py_test.
My thoughts on unit tests setup:

  • Unit tests should be hermetic, self-contained and small.
  • Prefer unittest than pytest:
    • pytest is not built-in library, for our current situation (no dependency setup on python side), using built-in library unittest is more proper;
    • assert doesn't print out value if mismatch, while functions like assertEqualdo, which helps debugging

Signed-off-by: hjiang <hjiang@anyscale.com>
@dentiny dentiny force-pushed the hjiang/validate-uv-options branch from e826b49 to 97bdec9 Compare October 31, 2024 23:46
@jjyao jjyao added the go add ONLY when ready to merge, run all tests label Nov 1, 2024
Copy link
Collaborator

@jjyao jjyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we split the PRs into:

  1. "uv": ["dep1", "dep2"] work e2e
  2. "uv": "requirements.txt" work e2e
  3. "uv": {} work e2e

# TODO(hjiang): More package installation options to implement:
# 1. Allow users to pass in a local requirements.txt file, which relates to all
# packages to install;
# 2. Allow specific version of `uv` to use; as of now we only use latest version.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default, we should use whatever version that's currently installed?

Copy link
Contributor Author

@dentiny dentiny Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There're two possibilities:

  • If we have uv installed in the env already, we should use it without installation; this hasn't been implemented in [core] [2/N] Implement uv processor #48486, but I left a comment
  • If no uv found in the env, we should install the default version, as you mentioned

If user specify a particular version to use, then that's another story.


The value of the input 'uv' field can be one of two cases:
1) A List[str] describing the requirements. This is passed through.
Example usage: "packages":["tensorflow", "requests"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Example usage: "packages":["tensorflow", "requests"]
Example usage: "uv":["tensorflow", "requests"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There're three options supported in runtime env now:

  • Example: ["requests==1.0.0", "aiohttp", "ray[serve]"]
  • Example: "./requirements.txt"
  • Example: {"packages":["tensorflow", "requests"], "pip_check": False, "pip_version": "==22.0.2;python_version=='3.8.11'"}

I feel remove the key is better, just to reduce confusion on list vs dict, updated.

1) A List[str] describing the requirements. This is passed through.
Example usage: "packages":["tensorflow", "requests"]
2) A python dictionary that has three fields:
a) packages (required, List[str]): a list of uv packages, it same as 1).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the other 2 fields?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I don't understand, I'm trying to mimic the documentation for pip.

"""Parses and validates a user-provided 'pip' option.
The value of the input 'pip' field can be one of two cases:
1) A List[str] describing the requirements. This is passed through.
2) A string pointing to a local requirements file. In this case, the
file contents will be read split into a list.
3) A python dictionary that has three fields:
a) packages (required, List[str]): a list of pip packages, it same as 1).
b) pip_check (optional, bool): whether to enable pip check at the end of pip
install, default to False.
c) pip_version (optional, str): the version of pip, ray will spell
the package name 'pip' in front of the `pip_version` to form the final
requirement string, the syntax of a requirement specifier is defined in
full in PEP 508.
The returned parsed value will be a list of pip packages. If a Ray library
(e.g. "ray[serve]") is specified, it will be deleted and replaced by its
dependencies (e.g. "uvicorn", "requests").

Could you please tell me which part am I missing here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean we should support pip_check field as well and possibly uv_version field.

Copy link
Contributor Author

@dentiny dentiny Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About uv_version, since the functionality is not implemented yet, I already left a TODO comment at the start of the function. IMO, only implemented feature should be officially documented.
I will definitely document it when feature implemented. Let me know if you're fine with it.

About pip_check, my concern is it doesn't have 100% compatibility between uv and pip, in some cases it would report failure, check out https://github.com/astral-sh/uv/pull/2544/files

One concrete example might be, uv's pip_check warns against multiple versions of a package, while pip version doesn't.
So I'm hesitant whether to implement it or not; but anyway I left a TODO comment at the beginning, so we don't forget to

@dentiny
Copy link
Contributor Author

dentiny commented Nov 1, 2024

How about we split the PRs into:

  1. "uv": ["dep1", "dep2"] work e2e
  2. "uv": "requirements.txt" work e2e
  3. "uv": {} work e2e

Hi @jjyao , could you please elaborate what do you mean by e2e? Do you mean validation + download?
I considered that, but don't think it's good because download might be a giant PR, including

  • extract common utils out
  • pip processor which implements uv download and package installation
  • pip plugin and integration

I plan to scope my PRs in these steps:

  • Implement validation for (1) and (3), since they're basically the same logic
  • Common utils extraction and uv installation
  • Integration with other plugins
  • Other missing features

This way, we could keep all PRs in a controllable size, and easier to review. And it only affects production when we reach the third step.
Let me know if it's acceptable for you.

Signed-off-by: hjiang <dentinyhao@gmail.com>
Signed-off-by: hjiang <dentinyhao@gmail.com>
@dentiny dentiny requested a review from jjyao November 1, 2024 19:55
Signed-off-by: hjiang <dentinyhao@gmail.com>
@jjyao
Copy link
Collaborator

jjyao commented Nov 1, 2024

By E2E, I mean the user can uses the feature.

I was trying to avoid a state where the validation passes but the underlying implementation is not there yet.

@dentiny
Copy link
Contributor Author

dentiny commented Nov 1, 2024

By E2E, I mean the user can uses the feature.
I was trying to avoid a state where the validation passes but the underlying implementation is not there yet.

Discussed offline, Hao will add uv feature from implementation to interface, so reduce PR size and make sure we don't have half-done interface exposed.

After we have a minimal working version for uv, later TODO features (i.e. specific uv version) will be implemented from end to end.

python/ray/_private/runtime_env/validation.py Outdated Show resolved Hide resolved
@@ -0,0 +1,36 @@
from python.ray._private.runtime_env import validation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have test_runtime_env_validation.py, could you combine them?

Copy link
Contributor Author

@dentiny dentiny Nov 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No plan at the moment, but I could put them in the same folder, if you want.

Let me explain my thoughts:

  • They serve for different purpose. They shouldn't be placed in the same test suite. At the moment, we only have integration test for python features, which relies on the latest current version of ray and test on the whole ray in principle; while I'm writing unit test, whose test target is single function or class.
  • They have different runtime and config. Integration test generally requires external access, while unit test is and requires hermetic and self-contained (bazel test put tests in a sandbox to prevent all external accesses). They usually have different runtime as well, unit tests are small and quick (bazel by default limits a test for 300 sec).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you check TestValidatePip it's unit test not integration test.

Copy link
Contributor Author

@dentiny dentiny Nov 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the test suite and test file overall is an integration test, which we import latest version of whole ray.

If you're fine about it, may I move all unit test into a separate unit test file? I guess what you care about is, we should place related test in one single place?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Having all unit tests in one place.

Copy link
Contributor Author

@dentiny dentiny Nov 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I move a few test cases into my unit test file, also moved it under tests folder for aggregation.
TODO left for conda related test due to its dependency issue.

Signed-off-by: hjiang <dentinyhao@gmail.com>
@dentiny dentiny requested a review from jjyao November 2, 2024 05:30
Signed-off-by: hjiang <dentinyhao@gmail.com>
Copy link
Collaborator

@jjyao jjyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG



if __name__ == "__main__":
unittest.main()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sys.exit(pytest.main(["-v", "-s", __file__]))? just to be consistent with other tests.

If we want to move away from pytest, it's a separate topic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, our current bazel setup cannot import pytest without any pain :(

@@ -0,0 +1,121 @@
# TODO(hjiang): Move conda related unit test to this file also, after addressing the `
# yaml` third-party dependency issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's follow what serve does here (python/ray/serve/tests/unit) and create python/ray/tests/unit. Also checkout out python/ray/serve/tests/unit/BUILD. The rule is everything under unit should have no import ray.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we take a look at the so-called unit test python/ray/serve/tests/unit, it's still importing ray, so still integration test sadly :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yeah, I could make another folder for unit test only, SGTM :)

Signed-off-by: hjiang <dentinyhao@gmail.com>
@dentiny dentiny requested a review from jjyao November 2, 2024 22:09
python/ray/tests/unit/BUILD Outdated Show resolved Hide resolved


if __name__ == "__main__":
unittest.main()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the other tests are run via purest, can we please use pytest as well for this PR to be consistent with the rest of code base. You can have another PR to change everything from pytest to unittest.

What's the pain of using pytest here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the pain of using pytest here?

You asked in another thread, copy it here: #48486 (comment)

Signed-off-by: hjiang <dentinyhao@gmail.com>
# TODO(hjiang): Move conda related unit test to this file also, after addressing the `
# yaml` third-party dependency issue.

from python.ray._private.runtime_env import validation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't from ray._private.runtime_env import validation work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@dentiny
Copy link
Contributor Author

dentiny commented Nov 4, 2024

@jjyao Discussed offline, updated the implementation to use pytest for code consistency.

Signed-off-by: hjiang <hjiang@anyscale.com>
@dentiny dentiny force-pushed the hjiang/validate-uv-options branch from 6db420b to d08af64 Compare November 4, 2024 22:37
@jjyao jjyao enabled auto-merge (squash) November 4, 2024 23:58
@jjyao jjyao merged commit 7e2ba28 into ray-project:master Nov 5, 2024
6 checks passed
Jay-ju pushed a commit to Jay-ju/ray that referenced this pull request Nov 5, 2024
Signed-off-by: hjiang <hjiang@anyscale.com>
JP-sDEV pushed a commit to JP-sDEV/ray that referenced this pull request Nov 14, 2024
Signed-off-by: hjiang <hjiang@anyscale.com>
mohitjain2504 pushed a commit to mohitjain2504/ray that referenced this pull request Nov 15, 2024
Signed-off-by: hjiang <hjiang@anyscale.com>
Signed-off-by: mohitjain2504 <mohit.jain@dream11.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants