-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core][Provisioner] Support open ports on RunPod #3748
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome @cblmemo! It is great that we can expose ports on runpod. Left several comments below.
@@ -27,7 +27,7 @@ def do_GET(self): | |||
|
|||
if __name__ == '__main__': | |||
parser = argparse.ArgumentParser(description='SkyServe HTTP Test Server') | |||
parser.add_argument('--port', type=int, required=False, default=8081) | |||
parser.add_argument('--port', type=int, required=False, default=8080) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we change this port from 8081 to 8080?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RunPod default image has an nginx running at 8081 on launch:
(base) root@2c7c82d210d5:~# lsof -i :8081
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
nginx 40 root 9u IPv4 3927576185 0t0 TCP *:8081 (LISTEN)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment somewhere (probably in the RunPod provisioned codebase) mentioning that the 8081 port is being taken by RunPod image by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added. Thanks!
sky/backends/cloud_vm_ray_backend.py
Outdated
cloud = handle.launched_resources.cloud | ||
if not (cloud.OPEN_PORTS_VERSION <= | ||
clouds.OpenPortsVersion.OPEN_ON_LAUNCH_ONLY): | ||
with rich_utils.safe_status( | ||
'[bold cyan]Launching - Opening new ports'): | ||
self._open_ports(handle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we let the bulk_provision
to open ports for both non-existing and existing clusters? If the cluster already exists and the cloud cannot open new ports on an existing cluster, we can skip the ports opening in the bulk_provision
and print a warning here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, we have some logic to calculate the newly-required ports (e.g. launched with 8000-8010
, now adding 8005-8015
, only need to add 8011-8015
). If we want to move the entire logic into bulk_provision
, we need to move that logic as well. I'm slightly concerned if it will be the best place to put that logic but open to discussions.
Also, this will be a big refactoring for clouds that create SG and update them later - now we need to create/update a fully configured SG in the first place. Do you think we could do this in another PR?
controller_process.start() | ||
serve_state.set_service_controller_port(service_name, | ||
controller_port) | ||
|
||
# TODO(tian): Support HTTPS. | ||
controller_addr = f'http://localhost:{controller_port}' | ||
controller_addr = f'http://{controller_host}:{controller_port}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it work with this changed to controller_host
for kubernetes? cc'ing @romilbhardwaj.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k8s will still use 0.0.0.0
in this setting so it should be fine. Just tested examples/serve/http_server/task.yaml
on a local k8s controller and it seems to be working well.
Side notes: When I tried k8s controller + k8s replica, the replica encountered a mysterious bug that could not detect ports opened on k8s replica. Though for k8s controller + aws replica it works well so I think the problem is not introduced by this PR. Will investigate more.
W 07-30 01:58:54 backend_utils.py:2809] Port 8080 not exposed yet. If the cluster was recently started, please retry after a while. Additionally, make sure your LoadBalancer is configured correctly.
W 07-30 01:58:54 backend_utils.py:2809] To debug, run: kubectl describe service
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just confirmed that it is not working on master as well. Just filed an issue #3798 to keep track of this 🫡
Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @cblmemo! It looks good to me. I think we should be able to let it in once all the tests have passed.
@@ -1468,6 +1468,10 @@ def _retry_zones( | |||
assert to_provision.region == region.name, (to_provision, | |||
region) | |||
num_nodes = handle.launched_nodes | |||
ports_to_open_on_launch = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ports_to_open_on_launch = ( | |
# Some clouds, like RunPod, only support exposing ports during launch. For those | |
# clouds, we pass the ports to open in the `bulk_provision` to expose the ports | |
# during provisioning. | |
# If the `bulk_provision` is to apply on an existing cluster, it should be ignored by | |
# the underlying provisioner implementation. | |
ports_to_open_on_launch = ( |
Please double check the above comment is true. We do ignore the ports to open when it is on an existing cluster, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After an offline discussion, we've decided to raise the error here. Merging now 🫡
This PR adds support for opening ports on RunPod. RunPod's support for updating ports on an existing pod will lose all the data stored on the pod, effectively a termination and re-creation cycle, so currently we only support opening ports on creation of the cluster and disable updating ports on the cluster.
TODO:
Tested (run the relevant ones):
bash format.sh
sky launch --cloud runpod -c rp-port --ports 8000
and testpython -m http.server
manually withsky serve status rp-port --endpoint 8000
sky launch --cloud runpod -c rp-port --ports 8001
on the above cluster and it failed correctlysky serve up examples/serve/http_server/task.yaml -n rp-svc --cloud runpod
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
pytest tests/test_smoke.py::test_runpod_http_server_with_custom_ports
conda deactivate; bash -i tests/backward_compatibility_tests.sh