Credit: This manifest refers a lot to the engineering blog "Building a Machine Learning Platform with Kubeflow and Ray on Google Kubernetes Engine" from Google Cloud.
The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable.
-
Dependencies
kustomize
: v3.2.0 (Kubeflow manifest is sensitive tokustomize
version.)Kubernetes
: v1.23
-
Computing resources:
- 16GB RAM
- 8 CPUs
# Kubeflow is sensitive to Kubernetes version and Kustomize version.
kind create cluster --image=kindest/node:v1.23.0
kustomize version --short
# 3.2.0
-
This example installs Kubeflow with the v1.6-branch.
-
Install all Kubeflow official components and all common services using one command.
- If you do not want to install all components, you can comment out KNative, Katib, Tensorboards Controller, Tensorboard Web App, Training Operator, and KServe from example/kustomization.yaml.
- Follow this document to install the latest stable KubeRay operator via Helm repository.
# Create a RayCluster CR, and the KubeRay operator will reconcile a Ray cluster
# with 1 head Pod and 1 worker Pod.
helm install raycluster kuberay/ray-cluster --version 0.5.0 --set image.tag=2.2.0-py38-cpu
# Check RayCluster
kubectl get pod -l ray.io/cluster=raycluster-kuberay
# NAME READY STATUS RESTARTS AGE
# raycluster-kuberay-head-bz77b 1/1 Running 0 64s
# raycluster-kuberay-worker-workergroup-8gr5q 1/1 Running 0 63s
- This step uses
rayproject/ray:2.2.0-py38-cpu
as its image. Ray is very sensitive to the Python versions and Ray versions between the server (RayCluster) and client (JupyterLab) sides. This image uses:- Python 3.8.13
- Ray 2.2.0
- Follow the instructions to forward the port of Istio's Ingress-Gateway and log in to Kubeflow Central Dashboard.
- Click "Notebooks" icon in the left panel.
- Click "New Notebook"
- Select
kubeflownotebookswg/jupyter-scipy:v1.6.1
as OCI image. - Click "Launch"
- Click "CONNECT" to connect into the JupyterLab instance.
Warning: Ray client has some known limitations and is not actively maintained.
- As mentioned in Step 4, Ray is very sensitive to the Python versions and Ray versions between the server (RayCluster) and client (JupyterLab) sides. Open a terminal in the JupyterLab:
# Check Python version. The version's MAJOR and MINOR should match with RayCluster (i.e. Python 3.8) python --version # Python 3.8.10 # Install Ray 2.2.0 pip install -U ray[default]==2.2.0
- Connect to RayCluster via Ray client.
# Open a new .ipynb page. import ray # ray://${RAYCLUSTER_HEAD_SVC}.${NAMESPACE}.svc.cluster.local:${RAY_CLIENT_PORT} ray.init(address="ray://raycluster-kuberay-head-svc.default.svc.cluster.local:10001") print(ray.cluster_resources()) # {'node:10.244.0.41': 1.0, 'memory': 3000000000.0, 'node:10.244.0.40': 1.0, 'object_store_memory': 805386239.0, 'CPU': 2.0} # Try Ray task @ray.remote def f(x): return x * x futures = [f.remote(i) for i in range(4)] print(ray.get(futures)) # [0, 1, 4, 9] # Try Ray actor @ray.remote class Counter(object): def __init__(self): self.n = 0 def increment(self): self.n += 1 def read(self): return self.n counters = [Counter.remote() for i in range(4)] [c.increment.remote() for c in counters] futures = [c.read.remote() for c in counters] print(ray.get(futures)) # [1, 1, 1, 1]