Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update ray to 2.23 and kuberay to 1.1.1 #2732

Merged
merged 2 commits into from
May 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ray_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ jobs:
run: ./tests/gh-actions/install_kind.sh

- name: Create KinD Cluster
run: kind create cluster --image=kindest/node:v1.23.0
run: kind create cluster --config tests/gh-actions/kind-cluster.yaml

- name: Install kustomize
run: ./tests/gh-actions/install_kustomize.sh
Expand Down
2 changes: 1 addition & 1 deletion contrib/ray/Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
KUBERAY_RELEASE_VERSION ?= 0.4.0
KUBERAY_RELEASE_VERSION ?= 1.1.1
KUBERAY_HELM_CHART_REPO ?= https://ray-project.github.io/kuberay-helm/

.PHONY: kuberay-operator/base
Expand Down
62 changes: 30 additions & 32 deletions contrib/ray/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ TODO

# Requirements
* Dependencies
* `kustomize`: v3.2.0 (Kubeflow manifest is sensitive to `kustomize` version.)
* `Kubernetes`: v1.23
* `kustomize`: v5.2.1+ (Kubeflow manifest is sensitive to `kustomize` version.)
* `Kubernetes`: v1.29+

* Computing resources:
* 16GB RAM
Expand All @@ -36,7 +36,7 @@ TODO
</figure>

## Step 1: Install Kubeflow v1.7-branch
* This example installs Kubeflow with the [v1.7-branch](https://github.com/kubeflow/manifests/tree/v1.7-branch).
* This example installs Kubeflow with the [v1.9-branch](https://github.com/kubeflow/manifests/tree/v1.9-branch).

* Install all Kubeflow official components and all common services using [one command](https://github.com/kubeflow/manifests/tree/v1.7-branch#install-with-a-single-command).
* If you do not want to install all components, you can comment out **KNative**, **Katib**, **Tensorboards Controller**, **Tensorboard Web App**, **Training Operator**, and **KServe** from [example/kustomization.yaml](https://github.com/kubeflow/manifests/blob/v1.7-branch/example/kustomization.yaml).
Expand All @@ -47,10 +47,10 @@ We never ever break Kubernetes standards and do not use the "default" namespace,

```sh
# Install a KubeRay operator and custom resource definitions.
kustomize build kuberay-operator/base | kubectl apply --server-side -f -
kustomize build kuberay-operator/overlays/kubeflow | kubectl apply --server-side -f -

# Check KubeRay operator
kubectl get pod -l app.kubernetes.io/component=kuberay-operator
kubectl get pod -l app.kubernetes.io/component=kuberay-operator -n kubeflow
# NAME READY STATUS RESTARTS AGE
# kuberay-operator-5b8cd69758-rkpvh 1/1 Running 0 6m23s
```
Expand All @@ -69,29 +69,27 @@ kubectl get pod -l ray.io/cluster=kubeflow-raycluster -n $MY_KUBEFLOW_USER_NAMES
# kubeflow-raycluster-head-p6dpk 1/1 Running 0 70s
# kubeflow-raycluster-worker-small-group-l7j6c 1/1 Running 0 70s
```
* `raycluster_example.yaml` uses `rayproject/ray:2.2.0-py38-cpu` as its OCI image. Ray is very sensitive to the Python versions and Ray versions between the server (RayCluster) and client (JupyterLab) sides. This image uses:
* Python 3.8.13
* Ray 2.2.0
* `raycluster_example.yaml` uses `rayproject/ray:2.23.0-py311-cpu` as its OCI image. Ray is very sensitive to the Python versions and Ray versions between the server (RayCluster) and client (JupyterLab) sides. This image uses:
* Python 3.11
* Ray 2.23.0

## Step 4: Forward the port of Istio's Ingress-Gateway
* Follow the [instructions](https://github.com/kubeflow/manifests/tree/v1.7-branch#port-forward) to forward the port of Istio's Ingress-Gateway and log in to Kubeflow Central Dashboard.

## Step 5: Create a JupyterLab via Kubeflow Central Dashboard
* Click "Notebooks" icon in the left panel.
* Click "New Notebook"
* Select `kubeflownotebookswg/jupyter-scipy:v1.7.0` as OCI image.
* Select `kubeflownotebookswg/jupyter-scipy:v1.9.0` as OCI image (or any other with the same python version)
* Click "Launch"
* Click "CONNECT" to connect into the JupyterLab instance.

## Step 6: Use Ray client in the JupyterLab to connect to the RayCluster
* As I mentioned in Step 3, Ray is very sensitive to the Python versions and Ray versions between the server (RayCluster) and client (JupyterLab) sides.
```sh
# Check Python version. The version's MAJOR and MINOR should match with RayCluster (i.e. Python 3.8)
# Check Python version. The version's MAJOR and MINOR should match with RayCluster (i.e. Python 3.11.9)
python --version
# Python 3.8.10

# Install Ray 2.2.0
pip install -U ray[default]==2.2.0
# Python 3.11.9
pip install -U ray[default]==2.23.0
```
* Connect to RayCluster via Ray client.
```python
Expand All @@ -106,29 +104,29 @@ kubectl get pod -l ray.io/cluster=kubeflow-raycluster -n $MY_KUBEFLOW_USER_NAMES
# {'node:10.244.0.41': 1.0, 'memory': 3000000000.0, 'node:10.244.0.40': 1.0, 'object_store_memory': 805386239.0, 'CPU': 2.0}

# Try Ray task
@ray.remote
def f(x):
return x * x
@ray.remote
def f(x):
return x * x

futures = [f.remote(i) for i in range(4)]
print(ray.get(futures)) # [0, 1, 4, 9]
futures = [f.remote(i) for i in range(4)]
print(ray.get(futures)) # [0, 1, 4, 9]

# Try Ray actor
@ray.remote
class Counter(object):
def __init__(self):
self.n = 0
# Try Ray actor
@ray.remote
class Counter(object):
def __init__(self):
self.n = 0

def increment(self):
self.n += 1
def increment(self):
self.n += 1

def read(self):
return self.n
def read(self):
return self.n

counters = [Counter.remote() for i in range(4)]
[c.increment.remote() for c in counters]
futures = [c.read.remote() for c in counters]
print(ray.get(futures)) # [1, 1, 1, 1]
counters = [Counter.remote() for i in range(4)]
[c.increment.remote() for c in counters]
futures = [c.read.remote() for c in counters]
print(ray.get(futures)) # [1, 1, 1, 1]
```

# Upgrading
Expand Down
8 changes: 2 additions & 6 deletions contrib/ray/kuberay-operator/base/aggregated-roles.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,21 +7,17 @@ metadata:
app: kuberay-operator
app.kubernetes.io/name: kuberay-operator
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-admin: "true"
aggregationRule:
clusterRoleSelectors:
- matchLabels:
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-kuberay-admin: "true"
rules: []
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kubeflow-kuberay-editor
name: kubeflow-kuberay-edit
labels:
app: kuberay-operator
app.kubernetes.io/name: kuberay-operator
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-edit: "true"
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-kuberay-admin: "true"
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-admin: "true"
rules:
- apiGroups:
- ray.io
Expand Down
1 change: 0 additions & 1 deletion contrib/ray/kuberay-operator/base/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,5 @@ patches:
type: RuntimeDefault
namespace: kubeflow
resources:
- namespace.yaml
- resources.yaml
- aggregated-roles.yaml
Loading