GitHub - cyrildiagne/kuda: Serverless APIs on remote GPUs

Status: Experimental 🧪

Easily deploy GPU models as serverless APIs

Deploy an API from a template

$ kuda deploy -f https://github.com/cyrildiagne/kuda/releases/download/v0.4.0-preview/example-hello-gpu.yaml

Call it!

$ curl https://hello-gpu.default.$your_domain

Hello GPU!

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   37C    P8    10W /  70W |      0MiB / 15079MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Serverless GPU inference

Kuda builds on Knative to allocate cloud GPUs only when there is traffic.

This is ideal when you want to share ML projects online without keeping expensive GPUs allocated all the time.

Turn any model into a serverless API

Kuda deploys APIs as containers, so you can use any language, any framework, and there is no library to import in your code.

All you need is a Dockerfile.

Here's a minimal example that just prints the result of nvidia-smi using Flask:

main.py

import os
import flask

app = flask.Flask(__name__)

@app.route('/')
def hello():
    return 'Hello GPU:\n' + os.popen('nvidia-smi').read()

Dockerfile

FROM nvidia/cuda:10.1-base

RUN apt-get install -y python3 python3-pip

RUN pip3 install setuptools Flask gunicorn

COPY main.py ./main.py

CMD exec gunicorn --bind :80 --workers 1 --threads 8 main:app

kuda.yaml

name: hello-gpu
deploy:
  dockerfile: ./Dockerfile

Running kuda deploy in this example would build and deploy the API to a url such as https://hello-gpu.my-namespace.example.com.

Checkout the full example with annotations in examples/hello-gpu-flask.

Features

Provision GPUs & scale based on traffic (from zero to N)
Interactive development on cloud GPUs from any workstation
Protect & control access to your APIs using API Keys
HTTPS with TLS termination & automatic certificate management

Name		Name	Last commit message	Last commit date
Latest commit History 196 Commits
.circleci		.circleci
cmd		cmd
dev		dev
docs		docs
examples/hello-gpu-flask		examples/hello-gpu-flask
install		install
pkg		pkg
scripts		scripts
web/auth		web/auth
.dockerignore		.dockerignore
.gitignore		.gitignore
.goreleaser.yml		.goreleaser.yml
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Easily deploy GPU models as serverless APIs

Serverless GPU inference

Turn any model into a serverless API

Features

Get Started

About

Releases

Packages

Languages

License

cyrildiagne/kuda

Folders and files

Latest commit

History

Repository files navigation

Easily deploy GPU models as serverless APIs

Serverless GPU inference

Turn any model into a serverless API

Features

Get Started

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages