Previous: Serverless Deep Learning
Next: (Coming soon)
Kubernetes is a container orchestration system used to automatically deploy, scale and operate containers.
In this lesson we will use Kubernetes along with TensorFlow Serving, a component of the TensorFlow Extended (TFX) family of technologies used to deploy models in production environments.
Note: in lesson 9 - serverless we used tflite for serving models in AWS Lambda. Tensorflow Serving is the actual tool intended for deploying models; we only used tflite with Lambda because Tensorflow-Serving is too big for it.
In the image above you can see the main architecture we will create during the lesson. The gateway will be created with Flask and will take care of downloading images, resizing them, preparing the input for the models and post-process the model outputs. The model server will be used for inference only and will receive for input an image in the form of a numpy array and will return a prediction using gRPC
to the gateway.
The gateway only uses CPU computing and needs less resources than the model server, which makes use of GPU computing. By decoupling these 2 components, we can run them in separate containers and in different numbers (for example, 5 gateway containers and 2 model server containers, as shown in the image), allowing us to optimize resources and lower deployment costs.
TensorFlow Serving (tf-serving
) requires models to be in a specific format.
import tensorflow as tf
from tensorflow import keras
model = keras.models.load_model('original_model.h5')
tf.saved_model.save(model, 'converted_model')
This code loads a Keras model and simply saves it to Tensorflow format rather than Keras.
Tensorflow-formatted models are not single files; they are directories containing a number of files. Here's an example model directory structure:
converted_model
┣╸ assets
┣╸ saved_model.pb
┗╸ variables
┣╸ variables.data-00000-of-00001
┗╸ variables.index
The saved_model_cli
utility allows us to inspect the contents of the tf model:
saved_model_cli show --dir converted_model --all
We're specifically interested in the signature definition of the model. The signature definition describes both the inputs and the outputs of the model. Here's an example of signature definition:
signature_def['serving_default']:
The given SavedModel SignatureDef contains the following input(s):
inputs['input_8'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 299, 299, 3)
name: serving_default_input_8:0
The given SavedModel SignatureDef contains the following output(s):
outputs['dense_7'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 10)
name: StatefulPartitionedCall:0
Method name is: tensorflow/serving/predict
WARNING: the output of
saved_model_cli
contains many more things than the signature which can be easily mixed. Make sure that you're looking atsignature_def['serving_default']
rather thansignature_def['__saved_model_init_op']
In the signature above we can see that our example model has 1 input called input_8
and 1 output called dense_7
. Both of these names as well as the name of the signature definition (in our example, serving_default
) are necessary for the next step.
TensorFlow-Serving has an official Docker image ready for deployment. By using volumes (folders in the host machine that can be mounted to containers in a manner similar to external storage) we can deploy our model in a volume and attach it to a container without the need to rebuild a new image. This is very convenient for developing as it saves a lot of time not having to rebuild images constantly just for trying out different models.
We can run the official tf-serving Docker image with our model mounted in a volume with the following command:
docker run -it --rm \
-p 8500:8500 \
-v "$(pwd)/converted_model:/models/converted_model/1" \
-e MODEL_NAME="converted_model" \
tensorflow/serving:2.7.0
-it
: short for-i -t
-i
: interactive mode-t
: open a terminal
--rm
: when the interaction with the container ends, using this flag means that the container will be deleted.\
: allows to escape the Enter key and create multi-line shell commands.-p 8500:8500
: maps port 8500 in the host to 8500 in the container. We use the port 8500 because the official tf-serving image uses it.-v "$(pwd)/converted_model:/models/converted_model/1"
-v
: volume. Maps a folder in the host (first half of the path) to a folder within the container (second half).- Both folder paths need to be absolute, so relative paths like
"./converted_model"
won't work. We can use a convenient trick with thepwd
command to print our current folder:$(pwd)
will be replaced with the current folder of the host machine. - The path to the container folder ends with a version number. If no version number is provided the
docker run
command will fail. You can just put/1
all the time if you have other means of controlling model versions.
-e MODEL_NAME="converted_model"
: environment variable. The image makes use of an environment variable calledMODEL_NAME
which must match the name of our model.tensorflow/serving:2.7.0
is the Docker image we will we using.
With the Docker container running, we can now test it from a Jupyter Notebook.
!pip install grpcio==1.42.0 tensorflow-serving-api==2.7.0
!pip install keras-image-helper
import grpc
import tensorflow as tf
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc
host = 'localhost:8500'
channel = grpc.insecure_channel(host)
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
tf-serving makes use of gRPC, a framework for connecting services in and across datacenters. gRPC uses Protocol Buffers (AKA Protobuf) for formatting data, a kind of binary encoding which is faster and more efficient than JSON.
gRPC can establish both insecure and secure communication channels. Secure channels can make use of authentication and other advanced security features, but since our model server won't be accessible from the outside, insecure channels are adequate for our use case.
The stub is our interface with the tf-serving in order to make inference with our model. It needs a channel as a parameter to establish the communication.
from keras_image_helper import create_preprocessor
preprocessor = create_preprocessor('xception', target_size=(299, 299))
url = 'http://bit.ly/mlbookcamp-pants'
X = preprocessor.from_url(url)
This is the same kind of code seen in lesson 9 for transforming the image into a numpy array that our model can process.
def np_to_protobuf(data):
return tf.make_tensor_proto(data, shape=data.shape)
pb_request = predict_pb2.PredictRequest()
pb_request.model_spec.name = 'converted_model'
pb_request.model_spec.signature_name = 'serving_default'
pb_request.inputs['input_8'].CopyFrom(np_to_protobuf(X))
We can now set up our request to our model by instancing a Protobuf request object and defining its model name, the model's signature name that we saw before and its input.
For the input, note that we make use of the input name that we found in the signature. We also convert the numpy array of our image to Protobuf format and copy it to the request object.
pb_response = stub.Predict(pb_request, timeout=20.0)
pb_response.outputs['dense_7'].float_val
Inference is done with the stub's Predict()
method. We pass out request object as a parameter and define a timeout as well.
The Predict()
method returns a Protobuf response object. We can access our predictions with the name of the output that we found in the signature definition. float_val
returns the predictions as a regular Python list, so there is no need to do additional conversions.
classes = [
'dress',
'hat',
#...
'pants'
]
dict(zip(classes, preds))
This code is just a convenience to tag each prediction value to the class they belong to.
Incidentally, this Jupyter Notebook code will be the basis for our gateway Flask app.
Remember that you can export a Jupyter Notebook to a Python script with the following command:
jupyter nbconvert --to script notebook.ipynb
This command will output a .py
file with the same name as the notebook.
We can rename the script to gateway.py
, clean it up, organize the code in methods and add a if __name__ == '__main__'
statement in order to convert the script to a Flask app.
Note: a Flask cheatsheet is available in this gist
There is one issue: in the notebook we defined the following function:
def np_to_protobuf(data):
return tf.make_tensor_proto(data, shape=data.shape)
The make_tensor_proto()
method is a TensorFlow method and TensorFlow is a huge library about 2GB in size. A smaller tensorflow-cpu
library exists but it still is over 400MB in size.
Since we only need to use that particular method, we can instead make use of a separate tensorflow-protobuf
package which is available on pip.
!pip install tensorflow-protobuf==2.7.0
The GitHub page for tensorflow-protobuf
contains info on how to replace the make_tensor_proto()
method.
Since the additional code is wordy, it would be convenient to define the np_to_protobuf()
method on a separate proto.py
script and then import it to the gateway app with from proto import np_to_protobuf
.
Now that we've created our gateway, it's time to dockerize it and run both the model and gateway concurrently locally.
The docker run
command we used for the tf-serving container is very long. If the volume and environment variable won't change often, we can create a new Dockerfile specifying them for easier deployment (managing volumes with tools such as Kubernetes is complex, so copying the model to the image is more convenient).
FROM tensorflow/serving:2.7.0
COPY converted_model /models/converted_model/1
ENV MODEL_NAME="converted_model"
This dockerfile will copy the model to the image.
Asuming we name this dockerfile as image-model.dockerfile
, we can build it with the following command:
docker build -t zoomcamp-10-model:v1 -f image-model.dockerfile .
Note: check the Docker cheatsheet for a
docker build
command refresher
We can now run this new image like this:
docker run -it --rm \
-p 8500:8500 \
zoomcamp-10-model:v1
With this new image running, it'd be a good idea to check whether or not the gateway can connect to it by running the gateway notebook we defined previously.
The dockerfile for the gateway will have to include the dependency management files, the gateway script (as well as the proto.py
script if you have one) and will have to define an entrypoint using a WSGI HTTP server such as gunicorn
.
Here's an example image-gateway.dockerfile
using pipenv
for dependency management:
FROM python:3.8.12-slim
RUN pip install pipenv
WORKDIR /app
COPY ["Pipfile", "Pipfile.lock", "./"]
RUN pipenv install --system --deploy
COPY ["gateway.py", "proto.py". "./"]
EXPOSE 9696
ENTRYPOINT ["gunicorn", "--bind=0.0.0.0:9696", "gateway:app"]
Build the dockerfile like this:
docker build -t zoomcamp-10-gateway:v1 -f image-gateway.dockerfile .
And now run it:
docker run -it --rm \
-p 9696:9696 \
zoomcamp-10-gateway:v1
With both images running, we can now test them with a simple test.py
script:
import requests
url = 'http://localhost:9696/predict'
data = {'url': 'http://bit.ly/mlbookcamp-pants'}
result = requests.post(url, json=data).json()
print(result)
However, this script will fail. If we check the Docker logs for the gateway container, you'll see a status = StatusCode.UNAVAILABLE
error along with details = "failed to connect to all addresses"
. In other words: the gateway service is trying to reach the model service but fails to do so.
The reason that the test script fails is because the gateway service is trying to connect to port 8500 to reach the model service in localhost (which is actually the container itself), but the port 8500 inside the gateway service container isn't connected to anything. Both containers are connected to the host machine, but since the containers are not connected bewteen them, they are effectively isolated from each other, as the image below shows.
In order to fix this issue, both containers must be in the same network. Placing both containers in a (virtual) network will allow them to connect to each other, thus allowing our test script to run successfully.
We can create this network with the docker
command; however, it's a very tedious process. We can use the docker-compose
command instead, which is used to run multi-container Docker applications and simplifies connecting different containers between them.
docker-compose
is great for developing and testing multi-container apps locally. We will later see other tools such as Kubernetes for deploying multi-container apps.
Instructions to install docker-compose
are available here.
docker-compose
makes use of yaml
files to define multi-container applications. The default file that docker-compose
looks for is called docker-compose.yaml
, similar to how docker
looks for the default Dockerfile
file.
The file defines which containers we should run and a few additional settings; docker-compose
takes care of setting up the virtual network and connect the containers.
However, before we define the docker-compose.yaml
file, we need to modify the gateway slightly. Currently, the connection to the model service is hardcoded as host = 'localhost:8500'
; we can use environment variables instead to get the host address.
# in gateway.py
# after all the library imports
import os
host = os.getenv('TF_SERVING_HOST', 'localhost:8500')
# rest of the code below
- The
os
module exposes operating system functionality, including environment variables. os.getenv()
looks for the environment variable in the first parameter (TF_SERVING_HOST
in our case). If the varaible cannot be found, the default value is the second parameter (the previously hardcodedlocalhost:8500
, useful for local testing).
We now need to rebuild the gateway container.
docker build -t zoomcamp-10-gateway:v2 -f image-gateway.dockerfile .
Notice the updated tag! This is now gateway image version 2.
Finally, here's what out docker-compose.yaml
file would look like:
version: "3.9"
services:
model-server:
image: zoomcamp-10-model:v1
gateway:
image: zoomcamp-10-gateway:v2
environment:
- TF_SERVING_HOST=model-server:8500
ports:
- "9696:9696"
- The
version
is just something required by thedocker-compose
standard in order to know which Docker engine version to use. - Our app has 2 components: the gateway and the model. We define these 2 components inside
services
:model-server
is the name we've defined for our tf-serving container.gateway
is the name for the gateway container.
- No additional configuration is required for
model-server
other than the image name. gateway
requires the following info:environment
is the environment variable that we define in order for the gateway to find the model.ports
is the port mapping necessary to connect to the Docker app from the outside.
docker-compose
makes use of the service names that we define in order to connect the the containers. The gateway needs to connect to the model; we named the model container asmodel-server
, thus we use this name as the value for the environment variable that we pass to the gateway, allowing the gateway to find the model inside the network thatdocker-compose
creates.
We can now run the app like this:
docker-compose up
Important: this command will only work when run from within the same directory where the
docker-compose.yaml
file is located.
Alternatively, we can also run in detached mode:
docker-compose up -d
This will escape the docker-compose output and return us to the terminal prompt. We can shut down the app with:
docker-compose down
With the app running, we should now be able to run the test script successfully.
As mentioned in the beginning of the lesson, Kubernetes is a technology that allows us to automatically deploy, scale and operate containers. In other words, it allows us to deploy the app we've created locally to the cloud and manage it.
A Kubernetes cluster contains nodes, and nodes contain pods. Pods are grouped in deployments along different nodes. Deployments are accessed through services. The ingress is the entry point to the cluster.
- A cluster is a Kubernetes deployment.
- A node is a server such as a local computer or an EC2 instance. Every cluster contains at least one node.
- A pod is a group of one or more containers with shared storage and network resources, as well as a specification for how to run the containers.
- A deployment is a group of pods all running the same image and config. The pods may be distributed in different nodes.
- A service is a sort of entrypoint that serves as a middleman between the user and the deployment: it receives the requests from the user and routes it to an available pod within the deployment, and then serves the reply to the user. The user may be a pod from a different deployment or an external application; therefore services may be external or internal, depending on whether the deployment needs to be accessed externally or it's an internal component of a bigger app.
- External services are called load balancers.
- Internal services are called ClusterIP's.
- The ingress is the resource that exposes HTTP/HTTPS routes from outside the cluster to services within the cluster.
Kubernetes can also manage deployments by scaling up (starting pods) and down (removing pods) as the user load increases or decreases. This is handled by the HPA (Horizontal Pod Autoscaler).
Note: We will not be dealing with neither HPA nor ingress in this course.
Before deploying our docker-compose app, we will deploy a simpler app with Kubernetes. We will need 2 additional tools:
Kind
(Kubernetes in Docker): this tool allows you to create a local Kubernetes cluster in your local computer.kubectl
(pronounced coob-control): the Kubernetes command-line tool, used to run commands against Kubernetes clusters.
For this example we will create a very basic Flask app that returns PONG
when receiving a GET request on the /ping
route.
from flask import Flask
app = Flask('ping')
@app.route('/ping', methods=['GET'])
def ping():
return "PONG"
if __name__ == "__main__":
app.run(debug=True, host='0.0.0.0', port=9696)
Note: your Python environment will need both
flask
andgunicorn
to run this app.
We will now create the Dockerfile for this ping app.
FROM python:3.8.12-slim
RUN pip install pipenv
WORKDIR /app
COPY ["Pipfile", "Pipfile.lock", "./"]
RUN pipenv install --system --deploy
COPY "ping.py" .
EXPOSE 9696
ENTRYPOINT ["gunicorn", "--bind=0.0.0.0:9696", "ping:app"]
Build the image:
docker build -t ping:v1 .
IMPORTANT:
docker build
will automatically apply the taglatest
to the image when no tag is specified. In the following sections we will use Kind, a tool that won't accept thelatest
tag for images, so we need to tag our image with a tag of our liking, such asv1
in this example.
Run the image:
docker run -it --rm -p 9696:9696 ping:v1
Finally, test it with curl
on a terminal:
curl localhost:9696/ping
You should see PONG
as the response from the app in the terminal.
kubectl
is the Kubernetes command-line tool used to run commands against Kubernetes clusters.
Installation instructions for all major platforms are available in the official Kubernetes website.
Note: Docker Desktop for both Windows and MacOS already includes
kubectl
.
The goal of the lesson is to deploy our app to EKS, AWS' Kubernetes service. AWS also distributes kubectl
; installation instructions from AWS can be found here. According to the EKS website, the Amazon distributed kubectl
binaries are identical to the official ones.
Kind is a tool that will allow us to create a local Kubernetes cluster in our computer for learning and developing.
Installation instructions are available on the official website.
Once Kind is installed, create a cluster with the following command (it may take a few minutes):
kind create cluster
Note: you may delete a cluster with
kind delete cluster
Once the cluster is created, in order to access it with kubectl
we need to provide the kind cluster context to it. You may do so with the following command:
kubectl cluster-info --context kind-kind
You may know use kubectl
to access the cluster. Try the following commands:
-
kubectl get service
- It should return a single entry named
kubernetes
.
- It should return a single entry named
-
kubectl get pod
- It should return
No resources found in default namespace.
- It should return
-
kubectl get deployment
- It should return
No resources found in default namespace.
- It should return
-
docker ps
- It should show a container running the
kindest
image, along with any other containers you might be running.
- It should show a container running the
Kind makes use of Docker to simulate a cluster, which is why docker ps
shows a container.
Deployments in Kubernetes are defined with YAML files. Here's an example deployment.yaml
file for our ping app:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ping-deployment
spec:
replicas: 1
selector:
matchLabels:
app: ping
template:
metadata:
labels:
app: ping
spec:
containers:
- name: ping-pod
image: ping:v1
resources:
limits:
memory: "128Mi"
cpu: "500m"
ports:
- containerPort: 9696
- The
.metadata.name
field contains the name of the deployment. In our example,ping-deployment
. - The
.spec.replicas
field states how many pods should be replicated in the deployment. In our example, we will only deploy 1 pod. - The
.spec.selector
field defines how the deployment finds which pods to manage. There are a few different selection rules available.- In our example, the
.spec.selector.matchLabels
field selects a label defined in the pod template (app: ping
).
- In our example, the
- The
template
field contains a number of sub-fields:- The
.metadata.labels
field states how pods are labeled; .in our example,app: ping
. - The
.template.spec
field is the pod template's specification, which states that the pods run a single container (ping-pod
) running theping:v1
image we created before.- The
resources
field specifies how many resources should the pods make use of. In this example, each pod will take a maximum of 128 MiB of RAM memory and will take "500 miliCPU's" of CPU time; in other words, half of a CPU core's available time.
- The
- The
Before we run the deployment, we need to load our image into our cluster nodes.
kind load docker-image ping:v1
We can now apply our deployment.yaml
file to our cluster:
kubectl apply -f deployment.yaml
Note: you can check the status of your cluster by using the
kubectl get deployment
andkubectl get pod
commands we saw before. In case of errors, you can get additional info from pods by using thekubectl describe pod pod-name-taken-from-get-pod | less
command.
Note 2: you can remove a deployment with
kubectl delete -f deployment.yaml
Finally, we test that the deployment is working as intended. We can do so with port forwarding in order to access the pod.
kubectl port-forward pod-name-taken-from-get-pod 9696:9696
Note: you will need to use
kubectl get pod
to get the name of the pod for the port forwarding command.
This command will temporarily forward the port of the pod so that we can access it from outside the cluster. You can stop the port forwarding by pressing Ctrl+C
on your keyboard.
With the port forwarding running, run the same curl
command from before:
curl localhost:9696/ping
You should get a PONG
response from the cluster.
We now need to offer permanent access from outside the cluster with a service. Services are also declared with YAML files.
Here's an example service.yaml
file for our ping app:
apiVersion: v1
kind: Service
metadata:
name: ping-service
spec:
type: LoadBalancer
selector:
app: ping
ports:
- port: 80
targetPort: 9696
- The service will be named
ping-service
as stated on the.metadata.name
field. - The
ping-service
service will target the port 9696 of any pod with theapp=ping
label. - The service will be external, AKA of
LoadBalancer
type. If this field is not specified, the default type would beClusterIP
(internal service). - We can access the service on port 80.
With the service.yaml
file created, we can now deploy it:
kubectl apply -f service.yaml
You can now check the state of the service with kubectl get service
and test it with the curl
command.
- IMPORTANT: when checking the service status, you will most likely find that the LoadBalancer service we created has
<pending>
as its external IP value. We did not configure our cluster to assign IP's to LoadBalancers, so you will have to make use of port forwarding for testing. -
kubectl port-forward service/ping-service 8080:80
- We use 8080:80 because port 80 locally is reserved for HTTP and would probably require root access to be changed, so we use port 8080 for the local port
-
curl localhost:8080/ping
We will now deploy our app to Kubernetes. The main idea is to re-create the docker-compose.yaml
file from our app in new YAML files for Kubernetes.
Let's begin by creating a new model-deployment.yaml
file.
apiVersion: apps/v1
kind: Deployment
metadata:
name: tf-serving-model
spec:
replicas: 1
selector:
matchLabels:
app: tf-serving-model
template:
metadata:
labels:
app: tf-serving-model
spec:
containers:
- name: tf-serving-model
image: zoomcamp-10-model:v1
resources:
limits:
memory: "512Mi"
cpu: "1"
ports:
- containerPort: 8500
- Note that we're using the same model image we created for
docker-compose
. - We're increasing the reources because the model requires them.
Remember that we need to load the image to Kind before we can use it:
kind load docker-image zoomcamp-10-model:v1
We can now apply the deployment to our cluster:
kubectl apply -f model-deployment.yaml
Finally, we will test it by forwarding the pod's port and using the local gateway script.
kubectl port-forward tf-serving-model-name-from-get-pod 8500:8500
# call the gateway script after the command above from another terminal
After we confirm that the deployment works, we need to create the service for the deployment. Here's the model-service.yaml
file for it:
apiVersion: v1
kind: Service
metadata:
name: tf-serving-model
spec:
selector:
app: tf-serving-model
ports:
- port: 8500
targetPort: 8500
- Note that we do not specify a service type. Therefore, this service will be the default internal service type,
ClusterIP
.
Now apply it:
kubectl apply -f model-service.yaml
And test it with port-forwarding and the local gateway script:
kubectl port-forward service/tf-serving-model 8500:8500
# call the gateway script after the command above from another terminal
Note: Due to the use of gRPC, you may see some errors in the port forwarding terminal logs even though the service should work as expected.
We now repeat the same steps we did for the model, but this time for the gateway.
Here's the gateway-deployment.yaml
file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: gateway
spec:
replicas: 1
selector:
matchLabels:
app: gateway
template:
metadata:
labels:
app: gateway
spec:
containers:
- name: gateway
image: zoomcamp-10-gateway:v2
resources:
limits:
memory: "128Mi"
cpu: "100m"
ports:
- containerPort: 9696
env:
- name: TF_SERVING_HOST
value: tf-serving-model.default.svc.cluster.local:8500
- Kubernetes makes use of namespaces to isolate group of resources within a cluster. The usage of namespaces falls beyond the scope of this course, but you can learn more about them in this link.
- You can get available namespaces with
kubectl get namespace
. The default namespace name isdefault
. - Some Kubernetes resources (such as pods, services, etc) are in a namespace. Low level resources such as nodes are not in any namespace.
- You can get available namespaces with
- Remember that the gateway made use of environment variables to find the model service. We need to provide the URL for the service to the
TF_SERVING_MODEL
environment variable. - When creating a service, it creates a corresponding DNS entry which follows a specific name format:
<service-name>.<namespace-name>.svc.cluster.local
.- The model service is called
tf-serving-model
and the namespace isdefault
; thus, the correct URL for the environment variable will betf-serving-model.default.svc.cluster.local:8500
.
- The model service is called
- You can test that the URL of the service is correct by logging into any pod with
kubectl exec -it pod-of-your-choosing -- bash
, then runtelnet tf-serving-model.default.svc.cluster.local 8500
to check that it's connecting properly (you cannot usecurl
in this instance because the model service will reject the connection).
Load the image:
kind load docker-image zoomcamp-10-gateway:v2
Apply the deployment:
kubectl apply -f gateway-deployment.yaml
And finally test it with port forwarding and the test script from before.
kubectl port-forward gateway-pod-name-taken-from-get-pod 9696:9696
# Now run the test script to test the gateway
We will now create the gateway-service.yaml
service file:
apiVersion: v1
kind: Service
metadata:
name: gateway
spec:
type: LoadBalancer
selector:
app: gateway
ports:
- port: 80
targetPort: 9696
- Note that we add the
LoadBalancer
type to the service file.
Apply the service:
kubectl apply -f gateway-service.yaml
And finally test it with port forwarding and the test script (check the port that you're accessing in the script!):
kubectl port-forward service/gateway 8080:80
# Now run the test script to test the gateway service
Kubernetes's default load balancing does not work with gRPC. This means that when deploying apps to Kubernetes that make use of gRPC such as our app, all the load ends up going to a single pod out of all of the pods available for the service that uses gRPC.
This article teaches you how to deal with this issue.
EKS is AWS' Kubernetes service. We will now deploy our app to EKS.
Note: in lesson 5 we covered AWS Elastic Beanstalk (EBS) for deployment. For simpler single-container apps, EBS is simpler to use than EKS.
We can use the AWS website or aws-cli
to create a cluster, but we will use instead eksctl
.
eksctl
is a CLI tool for creating and managing clusters on EKS. You may learn more about it on its official website. You can find download and install instructions on the AWS EKS documentation pages.
eksctl
can create clusters with default values. If you want to customize your cluster, you will need to provide a YAML configuration file.
Here's an example eks-config.yaml
file:
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: mlzoomcamp-eks
region: eu-west-1
nodeGroups:
- name: ng-m5-xlarge
instanceType: m5.xlarge
desiredCapacity: 1
- Inside the
metadata
field we specify thename
of our cluster and the AWSregion
in which we will be creating it. - A node group is a way to manage nodes that share similar specs. For example, our model may need nodes with GPUs but our gateway does not need them, so we could group all nodes with GPUs with a node group and all nodes without GPUs on another group.
- The
nodeGroups
field contains a list of node groups. We will only use a single group. Actually we only need a single node, sonodeGroups
will only have a single entry anddesiredCapacity
for the entry will be1
(one node group with a single node inside).- The
instanceType
is the type of AWS EC2 instance we will populate our node group with. In this example we will use instances of type M5 (general computing) extralarge. - We node group must also have a name. Using the name
ng-m5-xlarge
is an easy way to remember the instance type.
- The
With our config file, we can now create the cluster (it may take several minutes to complete):
eksctl create cluster -f eks-config.yml
For EKS to work we need to provide the Docker images we will use for our deployments.
We already explored how to upload a Docker image to ECR in Lesson 9. Below is a shortened explanation to create a repository and upload the images:
- Create an ECR repository and login to it. We will use the name
mlzoomcamp-images
for it. - Create the remote URIs for the model and gateway images.
- The URI prefix is the repo URI.
- The URI suffix will be the names of the images but substituting the colons with dashes.
zoomcamp-10-model:v1
becomeszoomcamp-10-model-v1
zoomcamp-10-gateway:v2
becomeszoomcamp-10-gateway-v2
- Tag the latest versions of your images with the remote URIs.
- Push the images to ECR.
We now need to modify both model-deployment-yaml
and gateway-deployment-yaml
so that the container specs point to the images hosted on ECR.
Simply edit the files and change the .spec.template.spec.image
field to the images' remote URIs.
Once eksctl
finishes creating the cluster, kubectl
should already be configured to work with it. You can test it with the following command:
kubectl get nodes
Once we've checked that kubectl
displays the nodes (actually single node in this example) of our cluster, we can start applying and testing the deployments and services.
Let's begin with the model.
kubectl apply -f model-deployment.yaml
kubectl apply -f model-service.yaml
kubectl get pod # check if the pod is active
kubectl get service # check if service is working
We can also do port forwarding to test the service:
kubectl port-forward service/tf-serving-model 8500:8500
# on another terminal, run the gateway script locally
Let's now continue with the gateway.
kubectl apply -f gateway-deployment.yaml
kubectl apply -f gateway-service.yaml
kubectl get pod # check if the pod is active
kubectl get service # check if service is working
This time, the output of kubectl get service
should show an external IP next to the gateway service.
Let's do port forwarding to test it:
kubectl port-forward service/gateway 8080:80
# on another terminal, run the test script locally
We can also telnet to the gateway service to test it. It's likely that you will not be able to telnet to it right after applying the service because the changes need to be propagated, but it should only take a few minutes. Do this after disabling port forwarding:
kubectl get service # copy the external url of the gateway service
telnet <url-of-the-gateway-service>
Finally, use the test script. Update the test script to point to the external URL of the gateway service.
WARNING: The gateway is open to everyone who has the URL. AWS will charge you for machine uptime and requests received. Leaving it as it is may result in unwanted charges. There are ways to limit access to the gateway but it falls outside the scope of this course.
After you're done with the cluster, you may delete it to avoid additional charges.
eksctl delete cluster --name mlzoomcamp-eks
Here are additional resources for learning more:
- Kubernetes-related tools:
- minikube - Local Kubernetes, VM based.
- k3s - Lightweight Kubernetes distribution for IoT and Edge computing; it supports ARM architectures as well as x86.
- k3d - Like Kind but for k3s.
- microk8s - Another lightweight Kubernetes distribution.
- EKS Anywhere - For running local Kubernetes clusters with optional support by AWS.
- Rancher Desktop - Like Docker Desktop but for Kubernetes (uses k3s and containerd).
- Docker Desktop - Docker Desktop now includes Kubernetes support built in.
- Lens - IDE for Kubernetes; like
kubectl
but with a GUI.
- Managed Kubernetes
- There are many Kubernetes-as-a-Platform providers besides AWS, such as GCP, Azure, Digital Ocean, etc. Search for Managed Kubernetes.
- Namespaces
- Namespaces are handy for managing different apps or different teams inside a single cluster.
Previous: Serverless Deep Learning
Next: (Coming soon)