Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core]: Autoscaler fails to start because it can't resolve the kubernetes host. #48538

Closed
ltbringer opened this issue Nov 4, 2024 · 0 comments · Fixed by #48541
Closed

[Core]: Autoscaler fails to start because it can't resolve the kubernetes host. #48538

ltbringer opened this issue Nov 4, 2024 · 0 comments · Fixed by #48541
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core P1 Issue that should be fixed within a few weeks

Comments

@ltbringer
Copy link
Contributor

What happened + What you expected to happen

We're testing the kuberay autoscaler following this guide and this chart as a reference.

The sidecar fails with this error:

raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='kubernetes.default', port=443): Max retries exceeded with url: /apis/ray.io/v1/namespaces/ml-platform-model-v6/rayclusters/ml-platform-model-v6-raycluster-wzg8l (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f16c1646b90>: Failed to establish a new connection: [Errno -2] Name or service not known'))

It seems the Ray Autoscaler needs to call kubernetes APIs. To do an equivalent of kubernetes get rayclusters using the API. The referenced part of the code makes an assumption about the kubernetes host, which is different in our case.

Versions / Dependencies

Version
Ray Version 2.34.0
Python 3.10, 3.11
OS Ubuntu:22.04

Reproduction script

I have a fix instead of repro:

curl https://kubernetes.default
curl: (6) Could not resolve host: kubernetes.default

If we use kubernetes set env vars or allow the kubernetes host as a parameter, we wouldn't face this issue.

curl -k https://$KUBERNETES_SERVICE_HOST:$KUBERNETES_SERVICE_PORT
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "Unauthorized",
  "reason": "Unauthorized",
  "code": 401
}

Issue Severity

High: It blocks me from completing my task.

@ltbringer ltbringer added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Nov 4, 2024
@jcotant1 jcotant1 added the core Issues that should be addressed in Ray Core label Nov 4, 2024
@jjyao jjyao added P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core P1 Issue that should be fixed within a few weeks
Projects
None yet
3 participants