This repo explains how to use Argo and K3s to automate Machine Learning pipelines called MLOps.
- Create a Virtual Machine on your prefered cloud provide
- Suggested size 2 CPUs + 4GB Ram
- Suggested OS Ubuntu 20.04 LTS
- Check that all ports are opened
- Set a static Public IP for your VM
- A Domain Name configured (ex. mlops.tk)
- Point your domain to the public ip of your VM
The following commands have to be executed inside your virtual machine:
- First update your Ubunut
sudo apt-get update
- Set a variable with your Public IP
PUBLIC_IP=YOUR_IP
- Install k3s with the next command
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--disable traefik --tls-san "$PUBLIC_IP" --node-external-ip "$PUBLIC_IP" --write-kubeconfig-mode 644" sh -s -
- Check that your unique node is on Ready status, with the next command
kubectl get nodes
- Install helm with the following commands
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh
- Download the kubeconfig in your local machine
ssh -i id_rsa yourUser@yourDomain cat /etc/rancher/k3s/k3s.yaml > ~/.kube/config
- Change the Kubernetes API connection from:
server: https://127.0.0.1:6443
to
server: https://yourDomain:6443
This section is to install NGINX as ingress controller, to install it follow the next steps:
- Create a namespace for NGINX
kubectl create ns ingress-nginx
- Add the NGINX Helm repo
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
- Install NGINX inside the ingress-nginx namespace
helm install ingress-nginx ingress-nginx/ingress-nginx -n ingress-nginx
This section install Argo Workflows, follow the next for this:
- Create a namespace called argo to install Argo Workflows
kubectl create ns argo
- Install Argo Workflows using kubectl
kubectl apply -n argo -f https://raw.githubusercontent.com/argoproj/argo/stable/manifests/install.yaml
- Because you are using k3s you have to support containerd as the container runtime with the next command:
kubectl patch configmap/workflow-controller-configmap \
-n argo \
--type merge \
-p '{"data":{"containerRuntimeExecutor":"k8sapi"}}'
- Check that everything is running with the next command:
kubectl get pods -n argo
- Access your Argo Workflow Deployment with port forward:
kubectl -n argo port-forward svc/argo-server 2746:2746
- Access Argo Workflow on your browser accessing the next url:
http://127.0.0.1:2746
Note: If you are using port-forward to access Argo Workflows locally, allow insecure connections from localhost in your browser. In Chrome, browse to: chrome://flags/. Search for “insecure” and you should see the option to “Allow invalid certificates for resources loaded from localhost.” Enable that option and restart your browser. Remember that by defaul Argo Workflows is installed with TLS.
This section is to install ArgoCD with the next commands:
- Create a namespace for ArgoCD:
kubectl create namespace argocd
- Install ArgoCD using kubectl
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
- Create the ingress controller modifing the file inside the argocd folder called argocd-ingress.yaml with your desired domain, for that check the host and hosts sections inside the file, then apply the YAML file with the next command:
kubectl apply -f argocd/argocd-ingress.yaml
- Set an A DNS record pointing to the subdomain where ArgoCD will be accesible Note: Because this is one node Kubernetes, the IP of the node is the same IP for the Load Balancer
- To get the ArgoCD password and generate a Token to launch ArgoCD get the argocd-server pod name, this will be the password to access ArgoCD, execute the next line to get argocd-server pod name:
kubectl get pods -n argocd -l app.kubernetes.io/name=argocd-server -o name | cut -d'/' -f 2
- Set a variable with the domain where ArgoCD is accesible
ARGOCD_SERVER=YourDomain
- Generate the token to access the ArgoCD API, this is necessary to call ArgoCD when Argo Workflow need it
curl -sSL -k $ARGOCD_SERVER/api/v1/session -d $'{"username":"admin","password":"argocd-server-XXX-YYY"}'
Note: The password is the name of your argocd pod inside your argocd namespace
For this repo is used Google Cloud Storage for the buckets, but you can use the Cloud Provider of your choice. For Google Cloud Storage Follow the next steps:
- Create a bucket called "kubeconeu2021"
- Create a service account that includes de Storage permissions to upload and download data from that bucket
- Upload data/scores.csv into that bucket, this file will be used for the ETL container that generates and upload the model to the bucket
This section explains how to generate custom Docker images to test this small workflow. You can start moving to the containers folder with the next command:
cd containers
The containers included are:
- argo_deploy: Deploy your model using ArgoCD
- etl: Remove unnecesary fields from the csv and upload the generated file(scores_processed.csv) to your bucket
- model_training: Train a new model using the Linear Regression algorithm and upload the model(scores.model) to your bucket
- model_serve: Creates a basic API REST to get predictions from the model
- inference: Get Predictions from the exposed model Note: For etl, model_serve and inference containers you need a service account json file called argok3s.json located inside each container folder in order to be pushed to DockerHub or your container registry of your choice.
To generate the argo_deploy container follow the next steps:
- Move to the argo_deploy folder
cd argo_deploy
- Run the build command using your ArgoCD domain or subdomain, ArgoCD token and your DockerHub user
/bin/bash build.sh ARGOCD_DOMAIN ARGOCD_TOKEN DOCKERHUB_USER
- Return to the containers folder
cd ..
Note: Use the ArgoCD token previously generated.
To generate your ETL container follow the next steps:
- Move to the etl folder
cd etl
- Run the build command using your DockerHub user
/bin/bash build.sh DOCKERHUB_USER
- Return to the containers folder
cd ..
To generate your Model Training container follow the next steps:
- Move to the etl folder
cd model_training
- Run the build command using your DockerHub user
/bin/bash build.sh DOCKERHUB_USER
- Return to the containers folder
cd ..
To generate your Model Serve container follow the next steps:
- Move to the etl folder
cd model_serve
- Run the build command using your DockerHub user
/bin/bash build.sh DOCKERHUB_USER
- Return to the containers folder
cd ..
To generate your Inference container follow the next steps:
- Move to the etl folder
cd inference
- Run the build command using your DockerHub user
/bin/bash build.sh DOCKERHUB_USER
- Return to the containers folder
cd ..
cd ..
- To execute an example from ArgoCD execute:
argo submit -n argo --serviceaccount argo --watch https://raw.githubusercontent.com/argoproj/argo/master/examples/hello-world.yaml
- To run a simple pipeline that includes our hole experiment execute:
argo submit -n argo --serviceaccount argo --watch pipelines/mlops-simple-pipeline.yaml
To send parameters using argo submit you can use -p parameter, to customize your execution
argo submit -n argo --serviceaccount argo --watch pipelines/mlops-simple-pipeline.yaml -p annotation="Reason of Running the ML Pipeline"
- To run a model deployment execute:
argo submit -n argo --serviceaccount argo --watch pipelines/mlops-model-deploy.yaml
To get some predictions from the model execute:
curl --header "Content-Type: application/json" \
--request POST --data '{"data":[17,17,25]}' \
http://mlops.tk/model1/predict
- k3s, v1.20.4+k3s1
- helm, 3
- To explore the code of your container you can rewrite your entrypoint:
docker run -it --entrypoint /bin/sh czdev/argocd-deploy
- To check all the enviroment variables execute in the terminal
printenv
- To create a virtual environment execute:
virtualenv env1
source env1/bin/activate|deactivate
Links used in this tutorial