This repository illustrates a CI/CD pipeline to automate the deployment of machine learning models to Modzy.
The objective of this repository is to provide a reference implementation of a GitHub Actions workflow that gives data scientists a mechanism to train their model(s) in their preferred workspace (e.g., this repository), configure a single JSON file (model_info.json
), and simply commit their changes to the main
branch. Doing so will trigger the CI/CD workflow, which will take resources from this repository and execute some Chassis python code to automatically containerize and deploy the trained model to Modzy. Please note that the implementation of the GitHub Actions workflow in this repository can be modified and set up several different ways, so this is a great place to start if you are interested in creating your own CI/CD pipeline!
The layout of this repository is strictly an example of how a data scientist might construct a model training repository. Feel free to set up your own repository however you wish.
.github/
: Folder that contains the GitHub action workflow definition inworkflows/ci.yml
data/
: Folder to hold any training data, sample test data, or additional model dependenciesweights/
: Folder to hold any saved model weightstrain.py
: Python script that trains and saves the weights locally for a scikit-learn logistic regression modelpackage.py
: Python script that leverages Chassis code to containerize a model and save the container locally in the Github docker registry.deploy.py
: Python script that leverages Modzy APIs to automatically deploy the container built inpackage.py
to Modzymodel_info.json
: Single JSON file used to define model information. The GitHub action references this file to complete the execution of the CI/CD workflow.requirements.txt
: Contains list of python packages required to execute any script in this repository
This repository is structured in a way such that data scientists only need to update weights files and model_info.json
any time they need to update a model. The workflow is set to execute upon every commit to the main
branch and will automatically execute Chassis code to containerize and deploy the model.
As a data scientist...
- Train your model and save your weights file according to your preference (
weights/model_latest.pkl
for example) - Fill in the following information in
model_info.json
:name
: Desired name for your model when it is deployed to Modzyversion
: Version of your model to deploy. Note: You can deploy as many versions of the same model to Modzy as you wishweightsFilePath
: File path in this repository to your updated weights filesampleDataFilePath
: File path in this repository to a sample data file that can be used to test your model during the CI/CD process.
Example model_info.json
:
{
"name": "GitHub Actions Sklearn Logistic Regression",
"version": "0.0.1",
"weightsFilePath": "weights/model_latest.pkl",
"sampleDataFilePath": "data/digits_sample.json"
}
As a DevOps or machine learning engineer...
- Ensure the Chassis code in the
.github/workflows/ci.yml
file aligns with the model the data scientist is building. Specifically theprocess
method must read in the sample data properly, use the loaded model to make predictions, and return the results in the data scientist's desired format. - Navigate to the Settings tab within this repository and click on Secrets --> Actions. Set the following Secrets (to be accessed in the GitHub Action workflow):
MODZY_URL
: Valid Modzy instance URLMODZY_API_KEY
: Valid Modzy API key associated withMODZY_URL
instance. Note: this API key must be associated with a user that has the "Data Scientist" role.
We are happy to receive contributions from all of our users. Check out our contributing file to learn more.