Morphling is an auto-configuration framework for machine learning model serving (inference) on Kubernetes. Check the website for details.
Morphling paper accepted at ACM Socc 2021:
Morphling: Fast, Near-Optimal Auto-Configuration for Cloud-Native Model Serving
Morphling tunes the optimal configurations for your ML/DL model serving deployments. It searches the best container-level configurations (e.g., resource allocations and runtime parameters) by empirical trials, where a few configurations are sampled for performance evaluation.
Key benefits include:
- Automated tuning workflows hidden behind simple APIs.
- Out of the box ML model serving stress-test clients.
- Cloud agnostic and tested on AWS, Alicloud, etc.
- ML framework agnostic and generally support popular frameworks, including TensorFlow, PyTorch, etc.
- Equipped with various and customizable hyper-parameter tuning algorithms.
From git root directory, run
kubectl apply -k config/crd/bases
kubectl create namespace morphling-system
kubectl apply -k manifests/configmap
kubectl apply -k manifests/controllers
kubectl apply -k manifests/pv
kubectl apply -k manifests/mysql-db
kubectl apply -k manifests/db-manager
kubectl apply -k manifests/ui
kubectl apply -k manifests/algorithm
By default, Morphling will be installed under morphling-system
namespace.
The official Morphling component images are hosted under docker hub.
Check if all components are running successfully:
kubectl get deployment -n morphling-system
Expected output:
NAME READY UP-TO-DATE AVAILABLE AGE
morphling-algorithm-server 1/1 1 1 34s
morphling-controller 1/1 1 1 9m23s
morphling-db-manager 1/1 1 1 9m11s
morphling-mysql 1/1 1 1 9m15s
morphling-ui 1/1 1 1 4m53s
bash script/undeploy.sh
kubectl get crd | grep morphling.kubedl.io | cut -d ' ' -f 1 | xargs kubectl delete crd
Helm is a package manager for Kubernetes. A demo installation on MacOS:
brew install helm
Check the helm website for more details.
From the root directory, run
helm install morphling ./helm/morphling --create-namespace -n morphling-system
You can override default values defined in values.yaml with --set
flag.
For example, set the custom cpu/memory resource:
helm install morphling ./helm/morphling --create-namespace -n morphling-system --set resources.requests.cpu=1024m --set resources.requests.memory=2Gi
Helm will install CRDs and other Morphling components under morphling-system
namespace.
helm uninstall morphling -n morphling-system
kubectl get crd | grep morphling.kubedl.io | cut -d ' ' -f 1 | xargs kubectl delete crd
Morphling UI is built upon Ant Design.
If you are installing Morphling with Yaml files, from the root directory, run
kubectl apply -k manifests/ui
Or if you are installing Morphling with Helm chart, Morphling UI is automatically deployed.
Check if all Morphling UI is running successfully:
kubectl -n morphling-system get svc morphling-ui
Expected output:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
morphling-ui NodePort 10.96.63.162 <none> 9091:30680/TCP 44m
If you are using minikube, you can get access to the UI with port-forward:
kubectl -n morphling-system port-forward --address 0.0.0.0 svc/morphling-ui 30263:9091
Then you can get access to the ui at http://localhost:30263/.
For detailed UI deployment and developing guide, please check UI.md
This example demonstrates how to tune the configuration for a mobilenet model deployed with Tensorflow Serving under Morphling.
For demonstration, we choose two configurations to tune: the first one the CPU cores (resource allocation), and the second one is maximum serving batch size (runtime parameter). We use grid search for configuration sampling.
kubectl -n morphling-system apply -f https://raw.githubusercontent.com/alibaba/morphling/main/examples/experiment/experiment-mobilenet-grid.yaml
To start multi-framework tunining experiment:
kubectl -n morphling-system apply -f examples/experiment/experiment-grid.yaml
You can specify the model name in this file examples/experiment/experiment-grid.yaml
. Noted that under the setting of INFERENCE_FRAMEWORK=vllm
and DTYPE=int8
, the bitsandbytes only support LLMs with LLAMA architecture (LlamaForCausalLM). So far we only support tuning between float16/bfloat16 and int8 data types. Make sure there are enough resources for LLM serving.
kubectl get -n morphling-system pe
kubectl describe -n morphling-system pe
kubectl -n morphling-system get trial
kubectl -n morphling-system get pe
Expected output:
NAME STATE AGE OBJECT NAME OPTIMAL OBJECT VALUE OPTIMAL PARAMETERS
mobilenet-experiment-grid Succeeded 12m qps 32 [map[category:resource name:cpu value:4] map[category:env name:BATCH_SIZE value:32]]
kubectl -n morphling-system delete pe --all
See Morphling Workflow to check how Morphling tunes ML serving configurations automatically in a Kubernetes-native way.
make manager
make test
make manifests
Download the right version of vllm .whl file to pkg/server
directory (the guidance to download) before building the image.
For example, if the CUDA version is 11.8 and want to download vllm with version 0.6.1.post1, then download vllm-0.6.1.post1+cu118-cp310-cp310-manylinux1_x86_64.whl
to pkg/server
directory. Noeted that the python version in this image is 3.10.
Then modify the arguments CUDA_VERSION
and VLLM_FILE
in script/docker_build.sh
, and building the image.
make docker-build
make docker-push
To develop/debug Morphling controller manager locally, please check the debug guide.
If you have any questions or want to contribute, GitHub issues or pull requests are warmly welcome.