Status: Experimental 🧪
- Deploy an API from a template
$ kuda deploy -f https://github.com/cyrildiagne/kuda/releases/download/v0.4.0-preview/example-hello-gpu.yaml
- Call it!
$ curl https://hello-gpu.default.$your_domain
Hello GPU!
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 37C P8 10W / 70W | 0MiB / 15079MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Kuda builds on Knative to allocate cloud GPUs only when there is traffic.
This is ideal when you want to share ML projects online without keeping expensive GPUs allocated all the time.
Kuda deploys APIs as containers, so you can use any language, any framework, and there is no library to import in your code.
All you need is a Dockerfile.
Here's a minimal example that just prints the result of nvidia-smi
using
Flask:
main.py
import os
import flask
app = flask.Flask(__name__)
@app.route('/')
def hello():
return 'Hello GPU:\n' + os.popen('nvidia-smi').read()
Dockerfile
FROM nvidia/cuda:10.1-base
RUN apt-get install -y python3 python3-pip
RUN pip3 install setuptools Flask gunicorn
COPY main.py ./main.py
CMD exec gunicorn --bind :80 --workers 1 --threads 8 main:app
kuda.yaml
name: hello-gpu
deploy:
dockerfile: ./Dockerfile
Running kuda deploy
in this example would build and deploy the API to a url
such as https://hello-gpu.my-namespace.example.com
.
Checkout the full example with annotations in examples/hello-gpu-flask.
- Provision GPUs & scale based on traffic (from zero to N)
- Interactive development on cloud GPUs from any workstation
- Protect & control access to your APIs using API Keys
- HTTPS with TLS termination & automatic certificate management