Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove alpha from pipeline parameters #169

Merged
merged 11 commits into from
Feb 1, 2020
12 changes: 1 addition & 11 deletions .pipelines/diabetes_regression-ci-build-train.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,30 +62,20 @@ stages:
echo "##vso[task.setvariable variable=AMLPIPELINEID;isOutput=true]$AMLPIPELINEID"
name: 'getpipelineid'
displayName: 'Get Pipeline ID'
- bash: |
# Generate a hyperparameter value as a random number between 0 and 1.
# A random value is used here to make the Azure ML dashboards "interesting" when testing
# the solution sample.
alpha=$(printf "0.%03d\n" $((($RANDOM*1000)/32767)))
echo "Alpha: $alpha"
echo "##vso[task.setvariable variable=ALPHA;isOutput=true]$alpha"
name: 'getalpha'
displayName: 'Generate random value for hyperparameter alpha'
- job: "Run_ML_Pipeline"
dependsOn: "Get_Pipeline_ID"
displayName: "Trigger ML Training Pipeline"
pool: server
variables:
AMLPIPELINE_ID: $[ dependencies.Get_Pipeline_ID.outputs['getpipelineid.AMLPIPELINEID'] ]
ALPHA: $[ dependencies.Get_Pipeline_ID.outputs['getalpha.ALPHA'] ]
steps:
- task: ms-air-aiagility.vss-services-azureml.azureml-restApi-task.MLPublishedPipelineRestAPITask@0
displayName: 'Invoke ML pipeline'
inputs:
azureSubscription: '$(WORKSPACE_SVC_CONNECTION)'
PipelineId: '$(AMLPIPELINE_ID)'
ExperimentName: '$(EXPERIMENT_NAME)'
PipelineParameters: '"ParameterAssignments": {"model_name": "$(MODEL_NAME)", "hyperparameter_alpha": "$(ALPHA)"}'
PipelineParameters: '"ParameterAssignments": {"model_name": "$(MODEL_NAME)"}'
- job: "Training_Run_Report"
dependsOn: "Run_ML_Pipeline"
condition: always()
Expand Down
14 changes: 14 additions & 0 deletions diabetes_regression/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"training":
{
"alpha": 0.4
},
"evaluation":
{

},
"scoring":
{

}
}
21 changes: 12 additions & 9 deletions diabetes_regression/training/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.externals import joblib
import json


def train_model(run, data, alpha):
Expand Down Expand Up @@ -62,13 +63,6 @@ def main():
help="Name of the Model",
default="sklearn_regression_model.pkl",
)
parser.add_argument(
"--alpha",
type=float,
default=0.5,
help=("Ridge regression regularization strength hyperparameter; "
"must be a positive float.")
)

parser.add_argument(
"--dataset_name",
Expand All @@ -79,14 +73,23 @@ def main():

print("Argument [build_id]: %s" % args.build_id)
print("Argument [model_name]: %s" % args.model_name)
print("Argument [alpha]: %s" % args.alpha)
print("Argument [dataset_name]: %s" % args.dataset_name)

model_name = args.model_name
build_id = args.build_id
alpha = args.alpha
dataset_name = args.dataset_name

print("Getting training parameters")

with open("config.json") as f:
pars = json.load(f)
try:
alpha = pars["training"]["alpha"]
except KeyError:
alpha = 0.5

print("Parameter alpha: %s" % alpha)

run = Run.get_context()
ws = run.experiment.workspace

Expand Down
7 changes: 4 additions & 3 deletions docs/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,8 @@ For instructions on how to set up a local development environment, refer to the

For using Azure DevOps Pipelines all other variables are stored in the file `.pipelines/diabetes_regression-variables.yml`. Using the default values as a starting point, adjust the variables to suit your requirements.

**Note:** In `diabetes_regression` folder you can find `config.json` file that we would recommend to use in order to provide parameters for training, evaluation and scoring scripts. An example of a such parameter is a hyperparameter of a training algorithm: in our case it's the ridge regression [*alpha* hyperparameter](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html). We don't provide any special serializers for this config file. So, it's up to you which template to support there.

Up until now you should have:

* Forked (or cloned) the repo
Expand Down Expand Up @@ -120,7 +122,7 @@ Check out the newly created resources in the [Azure Portal](portal.azure.com):
(Optional) To remove the resources created for this project you can use the [/environment_setup/iac-remove-environment.yml](../environment_setup/iac-remove-environment.yml) definition or you can just delete the resource group in the [Azure Portal](portal.azure.com).

**Note:** The training ML pipeline uses a [sample diabetes dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) as training data. If you want to use your own dataset, you need to [create and register a datastore](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-access-data#azure-machine-learning-studio) in your ML workspace and upload the datafile (e.g. [diabetes.csv](./data/diabetes.csv)) to the corresponding blob container. You can also define a datastore in the ML Workspace with [az cli](https://docs.microsoft.com/en-us/cli/azure/ext/azure-cli-ml/ml/datastore?view=azure-cli-latest#ext-azure-cli-ml-az-ml-datastore-attach-blob).
You'll also need to configure DATASTORE_NAME and DATAFILE_NAME variables in ***devopsforai-aml-vg*** variable group.
You'll also need to configure DATASTORE_NAME and DATAFILE_NAME variables in ***devopsforai-aml-vg*** variable group.


## Create an Azure DevOps Azure ML Workspace Service Connection
Expand Down Expand Up @@ -187,7 +189,7 @@ specified).
**Note:** If the model evaluation determines that the new model does not perform better than the previous one then the new model will not be registered and the pipeline will be cancelled.

* The third stage of the pipeline, **Deploy to ACI**, deploys the model to the QA environment in [Azure Container Instances](https://azure.microsoft.com/en-us/services/container-instances/). It then runs a *smoke test* to validate the deployment, i.e. sends a sample query to the scoring web service and verifies that it returns a response in the expected format.

Wait until the pipeline finishes and verify that there is a new model in the **ML Workspace**:

![trained model](./images/trained-model.png)
Expand Down Expand Up @@ -247,7 +249,6 @@ Make sure your webapp has the credentials to pull the image from the Azure Conta

* The provided pipeline definition YAML file is a sample starting point, which you should tailor to your processes and environment.
* You should edit the pipeline definition to remove unused stages. For example, if you are deploying to ACI and AKS, you should delete the unused `Deploy_Webapp` stage.
* The sample pipeline generates a random value for a model hyperparameter (ridge regression [*alpha*](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html)) to generate 'interesting' charts when testing the sample. In a real application you should use fixed hyperparameter values. You can [tune hyperparameter values using Azure ML](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters), and manage their values in Azure DevOps Variable Groups.
* You may wish to enable [manual approvals](https://docs.microsoft.com/en-us/azure/devops/pipelines/process/approvals) before the deployment stages.
* You can install additional Conda or pip packages by modifying the YAML environment configurations under the `diabetes_regression` directory. Make sure to use fixed version numbers for all packages to ensure reproducibility, and use the same versions across environments.
* You can explore aspects of model observability in the solution, such as:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,6 @@ def main():
name="model_name", default_value=e.model_name)
build_id_param = PipelineParameter(
name="build_id", default_value=e.build_id)
hyperparameter_alpha_param = PipelineParameter(
name="hyperparameter_alpha", default_value=0.5)

dataset_name = ""
if (e.datastore_name is not None and e.datafile_name is not None):
Expand All @@ -66,7 +64,6 @@ def main():
arguments=[
"--build_id", build_id_param,
"--model_name", model_name_param,
"--alpha", hyperparameter_alpha_param,
"--dataset_name", dataset_name,
],
runconfig=run_config,
Expand Down