This example illustrates how to use BudgetML to launch an API endpoint that runs on a preemtible Google Cloud Platform VM.
The endpoint gets issued a SSL certificate via LetsEncrypt. The server automatically spins back up again whenever GCP shuts down the instance. The endpoint is a full fledged FastAPI server, complete with interactive Swagger docs.
This particular example uses the awesome HuggingFace library to deploy a simple sentiment analysis
model based on BERT. It uses the HuggingFace pipeline convenience
function to achieve this. Here is the simple Predictor
class required for this (see predictor.py
).
class Predictor:
def load(self):
from transformers import pipeline
self.model = pipeline(task="sentiment-analysis")
async def predict(self, request):
# We know we are going to use the `predict_dict` method, so we use
# the request.payload pattern
req = request.payload
return self.model(req["text"])[0]
Just overriding two functions and the API is ready. No frills.
Here is how you can try it for yourself.
Create a Google Cloud Project, and additionally follow the pre-requisite steps:
- Enable billing.
- Enable APIs:
- Cloud Storage
- Compute Engine API
- Google Cloud Pub/Sub API
- Google Cloud Functions
- Cloud Build API
- Cloud Scheduler API
- Cloud Functions API
- Create a service account with the following roles:
roles/editor
. For ease:
Install the gcloud
CLI and use it as follows:
export GCP_PROJECT=<enter your Google Cloud Project name>
export SA_NAME="sa-name"
export SA_PATH="$(pwd)/sa.json"
# creating the actual Service Account
gcloud iam service-accounts create ${SA_NAME}
# creating a json-key for the new Service Account
gcloud iam service-accounts keys create ${SA_PATH} \
--iam-account ${SA_NAME}@${GCP_PROJECT}.iam.gserviceaccount.com
# give permissions to the new Service Account
gcloud projects add-iam-policy-binding ${GCP_PROJECT} \
--member=serviceAccount:${SA_NAME}@${GCP_PROJECT}.gserviceaccount.com \
--role "roles/editor"
export DOMAIN='DOMAIN'
export SUBDOMAIN='SUBDOMAIN'
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
export GCP_PROJECT='GCP_PROJECT'
export GCP_MACHINE_TYPE='GCP_MACHINE_TYPE' # change to whatever machine type you like
python create_ip.py
This will create and print a static IP address on the terminal. Set this in the environment
export IP_ADDRESS='IP_ADDRESS' # change to static IP generated from `create_ip.py`
In order to issue the SSL certificate via LetsEncrypt, there is a requirement to specify an A record that binds the static IP address to the subdomain and domain provided. This is simple in most popular domain providers. Here are guides for Hostgator, Namecheap and GoDaddy. These are the values to be set:
NAME=$SUBDOMAIN.$DOMAIN (e.g. model.mywebsite.com)
Value:$IP_ADDRESS (e.g. 35.137.10.45)
This might take a few minutes or even hours to propogate so good to pause here for a bit (or try the local deploy while we wait).
It is often a good idea to deploy locally to make sure that the Predictor
class is created properly.
Note that a Docker installation is required for this.
python deploy_local.py
This will pull the base API image, add your requirements (if any), and create a local container with the same
logic as the main launch
function does. After its completed, a USERNAME
and PASSWORD
will be completed, and
the API will be accessible at 0.0.0.0:8000/docs
.
Once tested locally, you are ready to deploy to the cloud. Use:
python deploy.py
There will a lot of output, but at the end a USERNAME
and PASSWORD
should show up (to be used in the next step). Here is
what happened:
- A Google Cloud Bucket is created in GCP_PROJECT, where the
Predictor
class is uploaded. - A (preemtible) is launched with a startup script that runs the docker-swag image and a custom API image based on the base budgetml image on startup.
- A Cloud Scheduler function is initiated that always starts the server after every minute, to ensure minimum downtime.
The above three combined ensures the cheapest possible API endpoint with the lowest possible downtime.
Navigating to http://0.0.0.0:8000/docs
(after deploy_local.py
) or https://$SUBDOMAIN.$DOMAIN/docs
(after deploy.py
)
will show a Swagger Docs page generated by FastAPI. To login, press the lock icon
and enter the username
and password
generated above. Then you can play with the predict_dict/
endpoint by
making a POST request with the following json body:
{
"payload": {"text": "BudgetML is so awesome. I love it!"}
}
and the response should be something like:
{
"label": "POSITIVE",
"score": 0.999879777431488
}
Alternatively, you can also use curl
. First, get the token by using your USERNAME
and PASSWORD
generated above.
curl -X POST "ENDPOINT/token" -H "accept: application/json" -H "Content-Type: application/x-www-form-urlencoded" -d "grant_type=&username=USERNAME&password=PASSWORD&scope=&client_id=&client_secret="
Then use the token generated from the above command to hit the API.
curl -X POST "ENDPOINT:8000/predict_dict/" -H "accept: application/json" -H "Authorization: Bearer TOKEN" -H "Content-Type: application/json" -d "{\"payload\":{\"text\":\"BudgetML is so awesome. I love it!\"}}"
Again, endpoint will be http://0.0.0.0:8000
for local and https://$SUBDOMAIN.$DOMAIN
for actual deployment.
Here is a screenshot of what to expect in the Swagger docs!
If you have questions, please open up an issue on the GitHub channel. Thats the easiest way to consolidate all requests for the maintainer team.