Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grpcio pin inside the recipe #109

Closed
1 task done
yinweisu opened this issue Aug 23, 2023 · 17 comments
Closed
1 task done

grpcio pin inside the recipe #109

yinweisu opened this issue Aug 23, 2023 · 17 comments
Labels
bug Something isn't working

Comments

@yinweisu
Copy link

Solution to issue cannot be found in the documentation.

  • I checked the documentation.

Issue

I found there's a grpcio pin inside the recipe, which doesn't match what's being specified in ray's setup.py. This pin will stop conda user install latest version of ray alongside with latest version of tensorflow, as there's a version conflicts in grpcio. It would be great if we can align this grpcio pin to setup.py in ray

Installed packages

NA

Environment info

NA
@yinweisu yinweisu added the bug Something isn't working label Aug 23, 2023
@richardliaw
Copy link

@mattip is this something you could look into?

@yinweisu
Copy link
Author

@richardliaw Thank you so much! This is very important to us, and it would be great if we can solve this ASAP

@mattip
Copy link
Contributor

mattip commented Aug 23, 2023

See #98 (specifically this comment) which led to issue #109 #90 and ray-project/ray#35383. I think there may be some conflict with the grpcio 1.46.6 ray vendors into its bazel build, and that internal version apparently conflicts with the conda-provided 1.50+

@yinweisu
Copy link
Author

So if I understand it correctly, the conda version of grpcio>1.50.0 is somehow different from what's being offered by pypi causing conda version of ray to fail? This issue is known, but not a clear path forward has been decided yet.

@mattip
Copy link
Contributor

mattip commented Aug 23, 2023

Yes, I think that is a fair assessment of the situation. It is not clear where to expand effort: moving ray to use a newer grpcio internally, or debugging why on all of conda-forge only the ray feedstock seems to need this <1.50 pin.

@yinweisu
Copy link
Author

@mattip Thanks for the explanation.
@richardliaw Does ray team have any plan regarding this? It's definitely not a long term solution to keep an obsolete pin. I believe there are security vulnerabilities presented in older versions of grpcio too. This could be a big obstacle to many service teams that want to integrate ray

@richardliaw
Copy link

richardliaw commented Aug 24, 2023 via email

@h-vetinari
Copy link
Member

The proper solution would be #90, i.e. do not vendor protobuf & grpc (or at least provide an opt-out in the ray build that we could use). Unfortunately, it's just so absurdly complicated to actually point to existing artefacts with bazel that I have not mustered the energy to jump down this rabbit hole.

Regarding failures in grpc, as soon as someone reproduces this, we'll try to get it fixed. I tried to switch on the C++ testing for grpc recently, but it doesn't even compile (filed an upstream bug; will take a while to get this going I think).

@mattip
Copy link
Contributor

mattip commented Aug 24, 2023

Here is what I had in PR #98 as a minimal reproducer:

$ mamba create -n throw-away1 python=3.9 ray-serve=2.4.0
$ conda activate throw-away1
$ python -c "import ray; ray.init()"
# succeeds
$ mamba install --no-deps grpcio=1.51.1 libgrpc=1.51.1 grpc-cpp=1.51.1
$ python -c "import ray; ray.init()"
# fails

$ mamba create -n throw-away2 python=3.9 ray-serve=2.4.0
$ conda activate throw-away2
$ python -c "import ray; ray.init()"
# succeeds
$ pip install grpcio==1.51.1
$ python -c "import ray; ray.init()"
# succeeds

Or maybe you were referring to something else?

@yinweisu
Copy link
Author

yinweisu commented Aug 24, 2023

@richardliaw Thanks for the info. Any timeline on when 2.7 will be out?
@h-vetinari Thank you for your time. It would be great if you can take a look with @mattip's minimal reproducer

@h-vetinari
Copy link
Member

Sorry, I misspoke (it was late...) - I meant a root cause rather than a reproducer. If we can point to a concrete problem in grpc, abseil, protobuf or whatever, we can try to get this fixed in the packaging or with the respective upstream maintainers.

@yinweisu
Copy link
Author

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/conda/envs/throw-away1/lib/python3.9/site-packages/ray/__init__.py", line 109, in <module>
    import ray._raylet  # noqa: E402
  File "python/ray/_raylet.pyx", line 119, in init ray._raylet
  File "/opt/conda/envs/throw-away1/lib/python3.9/site-packages/ray/exceptions.py", line 17, in <module>
    from ray.util.annotations import DeveloperAPI, PublicAPI
  File "/opt/conda/envs/throw-away1/lib/python3.9/site-packages/ray/util/__init__.py", line 5, in <module>
    from ray._private.services import get_node_ip_address
  File "/opt/conda/envs/throw-away1/lib/python3.9/site-packages/ray/_private/services.py", line 26, in <module>
    from ray._private.gcs_utils import GcsClient
  File "/opt/conda/envs/throw-away1/lib/python3.9/site-packages/ray/_private/gcs_utils.py", line 11, in <module>
    import grpc
  File "/opt/conda/envs/throw-away1/lib/python3.9/site-packages/grpc/__init__.py", line 22, in <module>
    from grpc import _compression
  File "/opt/conda/envs/throw-away1/lib/python3.9/site-packages/grpc/_compression.py", line 20, in <module>
    from grpc._cython import cygrpc
ImportError: libabsl_status.so.2301.0.0: cannot open shared object file: No such file or directory

@h-vetinari Thanks. I would like to help if we can get this unblocked. Is this stack trace enough for you to take a look on what's the problem behind?

@h-vetinari
Copy link
Member

@h-vetinari Thanks. I would like to help if we can get this unblocked. Is this stack trace enough for you to take a look on what's the problem behind?

That's enough for a start. Getting abseil libraries to be present isn't a big issue (if that is all there is to it). I rebased #87 and I also opened conda-forge/grpc-cpp-feedstock#312 (plus did some more work on conda-forge/grpc-cpp-feedstock#311).

@h-vetinari
Copy link
Member

So the abseil error is not present in conda-forge/grpc-cpp-feedstock#312 nor in #87. In the latter, we're again stuck with

+ python -c 'import ray; ray.init(include_dashboard=True)'
2023-08-27 12:06:41,530	WARNING services.py:1832 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 67104768 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=1.16gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
2023-08-27 12:06:41,667	INFO worker.py:1621 -- Started a local Ray instance.
[2023-08-27 12:06:42,584 E 39720 39720] core_worker.cc:201: Failed to register worker 01000000ffffffffffffffffffffffffffffffffffffffffffffffff to Raylet. IOError: [RayletClient] Unable to register worker with raylet. No such file or directory

This is something that was raised with ray already a while ago. I don't know what's going on there.

@yinweisu
Copy link
Author

@mattip I saw you created this issue referred by @h-vetinari. Any updates on it?

@mattip
Copy link
Contributor

mattip commented Aug 28, 2023

No updates. I do not know how to debug what seems to be failures to pass grpcio messages.

@mattip
Copy link
Contributor

mattip commented Oct 17, 2023

I think we can close this. Ray now pins to a higher version of grcpio

- grpcio >=1.50,<1.56

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants