This repo describes how to easily deploy a Scikit-learn model to azure functions, allowing the user to run the models 'serverlessly', on a cost-by-use basis. Alternatively, you can choose to deploy your Scikit model along the lines of scikit-learn-model-deployment-on-azure-ml.
- Fork this repository.
- Enable Actions in the actions tab of your respository.
- Train an sklearn classifier - save it with joblib as
model.joblib
and place it into/src/SklearnModelFunction/
. Update/src/requirements.txt
to match your model's dependencies. More info here - Create a resource group on azure, note the subscription-id and resource group name.
- Create a storage account in that resource group. This will be used for the terraform state files. Keep track of the storage account name. Create a private container in the storage account, name it
state
. - Generate and save secrets to github by following these steps.
- Add the names you noted down to
.github/workflows/deploy-to-azure.yaml
in:
env:
AZURE_RESOURCE_GROUP_NAME: {name of your resource group}
TERRAFORM_BACKEND_STORAGEACCOUNT: {name of your storage account}
ENVIRONMENT_NAME: {e.g. sklearnmodel}
ENVIRONMENT_TYPE: {e.g. dev, prod}
TERRAFORM_BACKEND_RESOURCEGROUP: {name of your resource group}
- Commit the changes to your fork. on-push, a github actions pipeline is triggered:
.github/workflows/deploy-to-azure.yaml
. This creates your azure functions app, as well as the python function in the app. Alternatively, you can manually trigger from the actions tab. - Test your function endpoint, e.g. using postman. See here.
Coming soon...
The notebook ./randomforest_example/model_creation.ipynb
shows how to create a simple randomforest model which predicts whether or not flights will be delayed by 15 minutes or more. The most important stepts are as follows:
We split the dataset into training and test to be able to verify the performance on unseen data after model training:
x, y = data.iloc[:,:-1], data.iloc[:,-1]
xtrain, xtest, ytrain, ytest = train_test_split(x, y)
A randomforest classifier object is created and fit with the training data and labels:
clf = RF()
clf.fit(xtrain, ytrain)
xtest contains rows of data the model has not seen before, so we use the predicted output labels, y_pred_test and y_test to check the performance of the model.
y_pred_test = clf.predict(xtest)
print(accuracy_score(ytest, y_pred_test))
>> 0.9204
Using the joblib library we dump the model into model.joblib
!!Note - this model is now saved in the ./randomforest_example/model
folder and should be (manually) placed in the /SklearnModelFunction/
folder to deploy it in later steps.
joblib.dump(clf, "./model/model.joblib")
The azure function can be found in /src/SklearnModelFunction/
. Most importantly, this folder contains the model: /src/SklearnModelFunction/model.joblib
, and the logic for loading the model and running a prediction: /src/SklearnModelFunction/__init__.py
. The file /src/requirements.txt
should contain all the requirements you import in your functions. You can find the general documentation on azure functions + python here.
Navigate to your cloud shell in the azure portal (or use the CLI) and generate your credentials (for powershell remove the ""):
az ad sp create-for-rbac --name "sp_sklearnmodel" --role contributor \
--scopes /subscriptions/{your_subscription_id}/resourceGroups/{your_rg} \
--sdk-auth
copy the credentials and use them to populate your github secrets, under settings -> secrets -> actions. Manually add the following secrets from the credentials: AZURE_CLIENT_ID
, AZURE_CLIENT_SECRET
, AZURE_SUBSCRIPTION_ID
, and AZURE_TENANT_ID
.
Navigate to your function app -> functions, and grab the URL of the endpoint. As we set it up, with a simple HTTP POST with a list of dicts in the the body, we can test our function:
[
{'Year': 2013, 'Month': 4, 'DayofMonth': 19, 'DayOfWeek': 5,'OriginAirportID': 11433, 'DestAirportID': 13303, 'CRSDepTime': 837, 'DepDelay': '-3.0', 'DepDel15': 0.0, 'CRSArrTime': 1138, 'ArrDelay': 0.0, 'Cancelled': 0.0 }
]
It returns the prediction (a binary variable) in the form of a list of predictions, one for each entries in the body of your POST request.
Note: in our model example, we drop 2 columns : carrier
and ArrDelay
when training the model. Consequently, before sending the data to our function endpoint, we should drop these columns. We recommand to perform such preprocessing before sending the data to your function. Alternatively, you can solve this in your functions __init.py__
but that is not recommended.
This repository is under the MIT License.