This guide will guide you to configure this project in GitHub Actions.
The automated pipelines in this repository will interact with Azure Machine Learning resources located in an Azure subcription. In order to do that, we need to create a special secret in GitHub to connect using the service principal created before.
To do that:
-
Go to settings in your repository.
-
Click on
Secrets
. -
Select
New repository secret
. -
Create the GitHub secrets to access your Azure environment. All the jobs in this repository will try to pull the credentials to access Azure using a secret called
AZURE_CREDENTIALS
. You should create this secret and populate it with the information of the Service Principal that you want to use for deployment. If you don't have Service Principal created, you can create one following this steps. The secrets have to be stored JSON format. Check this guide to know how. -
Create another secret named
AUTOMATION_OBJECT_ID
with the object ID of the service principal used forAZURE_CREDENTIALS
.To get the Object ID of a service principal follow this steps: Find service principal object ID
az ad sp show --id XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX --query objectId
The automated pipelines in this repository will use a set of variables to configure how the deployment should be done and where. Those variables are located in a YAML
file at .github/variables.yaml
.
Note: Placing configuration inside the a
YAML
file in the repository may not be the best choice in an enterprise setting. In this repository is used to reduce the amount of configuration needed.
Open the file and review the values of the following variables. Save the changes in this file and commit the changes to the repo.
WORKSPACENAME
: The name of the Azure Machine Learning workspace you want to use.RESOURCEGROUPNAME
: The name of the resource group where the resources are located.STORAGEACCOUNTNAME
: The name of the storage account where you want datasets to be placed. This datastorage account has to have a container namedtrusted
and be configured as a data source in Azure Machine Learning with the nametrusted
.LOCATION
: The location where resources are deployed by IaC.KEYVAULTNAME
: Name of the keyvault object where elements are stored. Leave the defaultkv-trunkbased-dev
since it is created by IaC.
The following variables are also present, but are specific to the project you are working on.
ENVPREFIX
: Name of the environment. By default itsdev
reffering todevelopment
. Possible values can beqa
,stg
andprd
.MODELNAME
: The name of the model you are building.DESCRIPTION
: Description of the model you are building.CONDAENVNAME
: Name of the environment that the model uses for training. This environment should match any of the environments that are available under the folderenvironments
.
Infrastructure is deployed automatically by IaC pipelines. However, some secrets are required and needs to be provided in the form of GitHub Action secrets. To configure that:
-
Go to settings in your repository.
-
Click on
Secrets
. -
Select
New repository secret
. -
Name secrets and configure the following secrets:
DATASETSCLIENTID
: The Client ID of the service principal created before.DATASETSCLIENTSECRET
: The Client Secret of the service principal created before.COMPUTEADMINUSERNAME
: The user name of the compute instances you want to use. For instancemladmin
.COMPUTEADMINUSERPASSWORD
: The password used for compute instances. For instancePass@word1
.
It should look like something like this:
All pipelines will be created automatically by GitHub.
-
On the navigation bar in the top of the page, select
settings
. -
Click on
branches
-
On the section
Branch protection rules
, click onAdd rule
-
Prevent direct access to
main
. Optionally, require approvers. -
Save the changes.
Certain actions in the pipeline, like a deployment, will require approval. To configure how approval works:
-
On the navigation bar in the top of the page, select
settings
and thenEnvironments
. -
You will see environments automatically added, named
amlworkspace_dev
anddev
. Click onamlworkspace_dev
. -
Check the option
Required reviewers
and add the required approvers. Stages like model registration and model deployment will go over this gate. -
Save the protection rules.
Run pipelines in the following order:
- Workspace-CD
- This will ensure the infrastructure is deployed and datasets created.
- Environment-CD
- This will ensure the environments to run training jobs are avaiable.
- Model-CT
- This will start the training of a model. Once a model is registered, then the pipeline Model-CD will run automatically.