This is complete demo to show the capabilities of RHODS for model finetuning using codeflare(ray,mcad,instascale) and inferencing using TGIS-Caikit
Open the OpenShift Container Platform web console. Install the NFD Operator using the Red Hat OperatorHub catalog.
$ oc get pods -n openshift-nfd
You should see the NFD Operator running.
Go to Operators > Installed Operators.
Click NodeFeatureDiscovery under the Provided APIs field.
Click Create NodeFeatureDiscovery.
Note: The values pre-populated by the OperatorHub are valid for the GPU Operator.
Verify the NFD Operator's functionality using the OpenShift Container Platform web console or the CLI.
Note: NVIDIA uses the PCI ID 10de.
oc describe node | egrep 'Roles|pci' | grep -v master
Roles: worker
feature.node.kubernetes.io/pci-10de.present=true
feature.node.kubernetes.io/pci-1d0f.present=true
Roles: worker
feature.node.kubernetes.io/pci-1013.present=true
feature.node.kubernetes.io/pci-8086.present=true
Roles: worker
feature.node.kubernetes.io/pci-1013.present=true
feature.node.kubernetes.io/pci-8086.present=true
Roles: worker
feature.node.kubernetes.io/pci-1013.present=true
feature.node.kubernetes.io/pci-8086.present=true
Navigate to Operators > OperatorHub and select All Projects.
Search for and install the NVIDIA GPU Operator.
Go to Operators > Installed Operators, and click NVIDIA GPU Operator.
Select the ClusterPolicy tab, then click Create ClusterPolicy.
Note: It might take 10-20 minutes to finish the installation. Verify the status as State: ready when the installation succeeds.
From the OpenShift UI, navigate to Operators --> OperatorHub and search for: Red Hat OpenShift Data Science.
Execute the following commands to apply necessary roles and bindings and instantiate codeflare kdef:
oc apply -f - <<EOF
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: rhods-operator-scc
rules:
- verbs:
- get
- watch
- list
- create
- update
- patch
- delete
apiGroups:
- security.openshift.io
resources:
- securitycontextconstraints
EOF
oc apply -f - <<EOF
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: rhods-operator-scc
subjects:
- kind: ServiceAccount
name: rhods-operator
namespace: redhat-ods-operator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: rhods-operator-scc
EOF
oc apply -f - <<EOF
apiVersion: kfdef.apps.kubeflow.org/v1
kind: KfDef
metadata:
name: codeflare-stack
namespace: redhat-ods-applications
spec:
applications:
- kustomizeConfig:
repoRef:
name: manifests
path: codeflare-stack
name: codeflare-stack
- kustomizeConfig:
repoRef:
name: manifests
path: ray/operator
name: ray-operator
repos:
- name: manifests
uri: https://github.com/red-hat-data-services/distributed-workloads/tarball/main
EOF
Execute the following command:
oc get route -n redhat-ods-applications | grep dash | awk '{print $2}'
Start with launching the CodeFlare notebook from the Red Hat OpenShift AI’s dashboard and cloning this repository, which includes the notebook and necessary files for the demo. Try the notebook llamafinetune_demo.ipynb to demo the fine-tuning job submission.
https://huggingface.co/avijra/Llama-2-7b-chat-hf-fine-tuned