Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Use original ray path instead of using python dir for ray path #3339

Merged
merged 5 commits into from
Mar 20, 2024

Conversation

Michaelvll
Copy link
Collaborator

@Michaelvll Michaelvll commented Mar 20, 2024

Fixes #3335

The issue was caused by the ray package in our kubernetes default image being installed in ~/.local/bin instead of the same dir as python /opt/conda/bin which is likely due to some write permission issue. We now use the original ray path by saving the executable path in a file.

Tested (run the relevant ones):

@romilbhardwaj
Copy link
Collaborator

romilbhardwaj commented Mar 20, 2024

Thanks @Michaelvll! I can confirm it works with the base image (sky launch -c myclus --cloud kubernetes -- echo hi). However, using a custom image fails:

sky launch -c myclus --image-id docker:ubuntu:latest --cloud kubernetes -- echo hi

...
D 03-20 08:16:45 provisioner.py:585]   File "/Users/romilb/Romil/Berkeley/Research/sky-experiments/sky/provision/instance_setup.py", line 223, in _setup_node
D 03-20 08:16:45 provisioner.py:585]     raise RuntimeError(
D 03-20 08:16:45 provisioner.py:585] RuntimeError: Failed to run setup commands on an instance. (exit code 1). Error: ===== stdout =====
D 03-20 08:16:45 provisioner.py:585]
D 03-20 08:16:45 provisioner.py:585] WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
D 03-20 08:16:45 provisioner.py:585]
D 03-20 08:16:45 provisioner.py:585] Reading package lists...
D 03-20 08:16:45 provisioner.py:585] Building dependency tree...
D 03-20 08:16:45 provisioner.py:585] Reading state information...
D 03-20 08:16:45 provisioner.py:585] gcc is already the newest version (4:11.2.0-1ubuntu1).
D 03-20 08:16:45 provisioner.py:585] patch is already the newest version (2.7.6-7build2).
D 03-20 08:16:45 provisioner.py:585] pciutils is already the newest version (1:3.7.0-6).
D 03-20 08:16:45 provisioner.py:585] fuse is already the newest version (2.9.9-5ubuntu3).
D 03-20 08:16:45 provisioner.py:585] curl is already the newest version (7.81.0-1ubuntu1.15).
D 03-20 08:16:45 provisioner.py:585] rsync is already the newest version (3.2.7-0ubuntu0.22.04.2).
D 03-20 08:16:45 provisioner.py:585] 0 upgraded, 0 newly installed, 0 to remove and 2 not upgraded.
D 03-20 08:16:45 provisioner.py:585] File ‘Miniconda3-Linux-x86_64.sh’ already there; not retrieving.
D 03-20 08:16:45 provisioner.py:585] bash: conda: command not found
D 03-20 08:16:45 provisioner.py:585] /usr/bin/python3: No module named pip
D 03-20 08:16:45 provisioner.py:585] bash: status: command not found
D 03-20 08:16:45 provisioner.py:585] /usr/bin/python3: No module named pip
D 03-20 08:16:45 provisioner.py:585] /usr/bin/python3: No module named pip
D 03-20 08:16:45 provisioner.py:585] /usr/bin/python3: No module named pip
D 03-20 08:16:45 provisioner.py:585] /usr/bin/python3: No module named pip

Perhaps we also need to setup paths for pip?

EDIT - I was running sky local up on my apple silicon machine, which has issues with custom images if the image isn't built for arm64. I tested on GKE and using custom image also works fine. LGTM!

Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the problem & fix. Q: why did this issue show up in K8s but not on cloud VM tests?

sky/skylet/constants.py Outdated Show resolved Hide resolved
sky/skylet/constants.py Outdated Show resolved Hide resolved
Michaelvll and others added 2 commits March 20, 2024 09:08
Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
@Michaelvll
Copy link
Collaborator Author

Not sure I understand the problem & fix. Q: why did this issue show up in K8s but not on cloud VM tests?

Just updated the PR description: "The issue was caused by the ray package in our kubernetes default image being installed in ~/.local/bin instead of the same dir as python /opt/conda/bin which is likely due to some write permission issue."

@Michaelvll Michaelvll merged commit 890fa8c into master Mar 20, 2024
19 checks passed
@Michaelvll Michaelvll deleted the use-ray-path branch March 20, 2024 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[core] sky launch fails on k8s with bash: /opt/conda/bin/ray: No such file or directory
4 participants