Release v1.0.2 · GoogleCloudPlatform/ai-on-gke

Ray Serve

Introduced support for Ray on Autopilot with 3 predefined worker groups - small (only CPU), medium (1 GPU), and large (8 GPUs): 7082b13

Ray on GKE Storage
#87 provides examples for Ray on GKE storage solutions:

One-click deploy setup for GCS bucket + Kuberay access of control
Leveraging GKE GCS Fuse CSI to access GCS Buckets as a shared filesystem and use standard file semantics (thereby eliminating the need to use specialized fsspec libraries)

Ray Data
The Ray data API tutorial with stable diffusion e2e finetuning example (PR) deploys a Ray training job from a Jupyter notebook to a Ray cluster on GKE, and illustrates the following:

Caching HuggingFace StableDiffusion model checkpoint into a GCS bucket and mount it to Ray workers in the Ray cluster hosted on GKE
Using RayData APIs to perform batch inference to generate regularization images needed for the fine-tuning
Using RayTrain framework for distributed training with multiple GPUs in a multi-node GKE cluster setup

Kuberay

Pin Kuberay version to v0.6.0 and helm chart version to v0.6.1
Install Kuberay operator in a dedicated namespace (ray-system)

Jupyter Notebooks

Secure authentication via Identity-aware proxy (IAP) is now enabled by default for Jupyterhub, for both Standard & Autopilot clusters. Here is the sample user guide to configure the IAP client in your Jupyterhub installation. This ensures the Jupyterhub endpoint is no longer exposed to the public internet.

Distributed training of PyTorch CNN

JobSet example for distributed training of PyTorch CNN handwritten digit classification model using the MNIST dataset.
Indexed Job example for distributed training of a PyTorch CNN handwritten digit classification model the MNIST dataset on NVIDIA T4 GPUs.

Inferencing using Saxml and an HTTP Server

Example to deploy an HTTP Server to handle HTTP requests to Sax, which has support for features such as model publishing, listing, updating, unpublishing, and generating predictions. With an HTTP server, interaction with Sax can also expand further than at the VM-level. For example, integration with GKE and load balancing will enable requests to Sax from inside and outside the GKE cluster.

Finetuning and Serving Llama on L4 GPUs

Example for finetuning Llama 7B model on GKE using 8 x L4 GPUs
Example for serving Llama 70B model on GKE with 2 L4 GPUs

Validation of Changes to Ray on GKE Templates

Pull requests now trigger cloud build tests to detect breaking changes made to the GKE platform and Kuberay solution templates.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.0.2