Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manage environments in conda YAML files #158

Merged
merged 18 commits into from
Jan 31, 2020
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ ENV/
env.bak/
venv.bak/
*.vscode
condaenv.*

# Spyder project settings
.spyderproject
Expand Down
28 changes: 28 additions & 0 deletions diabetes_regression/ci_dependencies.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
name: mlopspython_ci

dependencies:

# The python interpreter version.
- python=3.7.5

- r=3.6.0
- r-essentials=3.6.0
- numpy=1.18.1
- pandas=1.0.0
- scikit-learn=0.22.1

- pip=20.0.2
- pip:

# dependencies shared with other environment .yml files.
- azureml-sdk==1.0.79

# Additional pip dependencies for the CI environment.
- pytest==5.3.1
- pytest-cov==2.8.1
- requests==2.22.0
- python-dotenv==0.10.3
- flake8==3.7.9
- flake8_formatter_junit_xml==0.0.6
- azure-cli==2.0.77
- tox==3.14.3
4 changes: 2 additions & 2 deletions diabetes_regression/scoring/inference_config.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
entryScript: score.py
runtime: python
condaFile: conda_dependencies.yml
condaFile: ../scoring_dependencies.yml
extraDockerfileSteps:
schemaFile:
sourceDirectory:
enableGpu: False
baseImage:
baseImageRegistry:
baseImageRegistry:
Original file line number Diff line number Diff line change
Expand Up @@ -14,24 +14,23 @@
# This directive is stored in a comment to preserve the Conda file structure.
# [AzureMlVersion] = 2

name: project_environment
name: diabetes_scoring

dependencies:

# The python interpreter version.
# Currently Azure ML Workbench only supports 3.5.2 and later.
- python=3.7.5

# Required by azureml-defaults, installed separately through Conda to
# get a prebuilt version and not require build tools for the install.
- psutil=5.6 #latest

- numpy=1.18.1
- pandas=1.0.0
- scikit-learn=0.22.1

- pip=20.0.2
- pip:
# Required packages for AzureML execution, history, and data preparation.
- azureml-model-management-sdk==1.0.1b6.post1
- azureml-sdk==1.0.74
- scipy==1.3.1
- scikit-learn==0.22
- pandas==0.25.3
- numpy==1.17.3
- joblib==0.14.0
- gunicorn==19.9.0
- flask==1.1.1
- inference-schema[numpy-support]
# You must list azureml-defaults as a pip dependency
- azureml-defaults==1.0.85
- inference-schema[numpy-support]==1.0.1
17 changes: 17 additions & 0 deletions diabetes_regression/training_dependencies.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
name: diabetes_training

dependencies:

# The python interpreter version.
- python=3.7.5

- numpy=1.18.1
- pandas=1.0.0
- scikit-learn=0.22.1
#- r-essentials
#- tensorflow
#- keras

- pip=20.0.2
- pip:
- azureml-core==1.0.79
11 changes: 7 additions & 4 deletions docs/code_description.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,7 @@

### Environment Setup

- `environment_setup/requirements.txt` : It consists of a list of python packages which are needed by the train.py to run successfully on host agent (locally).

- `environment_setup/install_requirements.sh` : This script prepares the python environment i.e. install the Azure ML SDK and the packages specified in requirements.txt
- `environment_setup/install_requirements.sh` : This script prepares a local conda environment i.e. install the Azure ML SDK and the packages specified in environment definitions.

- `environment_setup/iac-*.yml, arm-templates` : Infrastructure as Code piplines to create and delete required resources along with corresponding arm-templates.

Expand All @@ -27,6 +25,12 @@
- `ml_service/pipelines/diabetes_regression_verify_train_pipeline.py` : determines whether the evaluate_model.py step of the training pipeline registered a new model.
- `ml_service/util` : contains common utility functions used to build and publish an ML training pipeline.

### Environment Definitions

- `diabetes_regression/training_dependencies.yml` : Conda environment definition for the training environment (Docker image in which train.py is run).
- `diabetes_regression/scoring_dependencies.yml` : Conda environment definition for the scoring environment (Docker image in which score.py is run).
- `diabetes_regression/ci_dependencies.yml` : Conda environment definition for the CI environment.

### Code

- `diabetes_regression/training/train.py` : a training step of an ML training pipeline.
Expand All @@ -39,5 +43,4 @@

### Scoring
- `diabetes_regression/scoring/score.py` : a scoring script which is about to be packed into a Docker Image along with a model while being deployed to QA/Prod environment.
- `diabetes_regression/scoring/conda_dependencies.yml` : contains a list of dependencies required by score.py to be installed in a deployable Docker Image
- `diabetes_regression/scoring/inference_config.yml`, deployment_config_aci.yml, deployment_config_aks.yml : configuration files for the [AML Model Deploy](https://marketplace.visualstudio.com/items?itemName=ms-air-aiagility.private-vss-services-azureml&ssr=false#overview) pipeline task for ACI and AKS deployment targets.
9 changes: 8 additions & 1 deletion docs/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,13 @@ Great, you now have the build pipeline set up which automatically triggers every

**Note:** The build pipeline also supports building and publishing ML
pipelines using R to train a model. This is enabled
by changing the `build-train-script` pipeline variable to either `diabetes_regression_build_train_pipeline_with_r.py`, or `diabetes_regression_build_train_pipeline_with_r_on_dbricks.py`. For pipeline training a model with R on Databricks you'll need
by changing the `build-train-script` pipeline variable to either of:
* `diabetes_regression_build_train_pipeline_with_r.py` to train a model
with R on Azure ML Compute. You will also need to add the
`r-essentials` Conda packages into `diabetes_regression/scoring_dependencies.yml`
and `diabetes_regression/training_dependencies.yml`.
* `diabetes_regression_build_train_pipeline_with_r_on_dbricks.py`
to train a model with R on Databricks. You will need
to manually create a Databricks cluster and attach it to the ML Workspace as a
compute (Values DB_CLUSTER_ID and DATABRICKS_COMPUTE_NAME variables should be
specified).
Expand Down Expand Up @@ -243,6 +249,7 @@ Make sure your webapp has the credentials to pull the image from the Azure Conta
* You should edit the pipeline definition to remove unused stages. For example, if you are deploying to ACI and AKS, you should delete the unused `Deploy_Webapp` stage.
* The sample pipeline generates a random value for a model hyperparameter (ridge regression [*alpha*](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html)) to generate 'interesting' charts when testing the sample. In a real application you should use fixed hyperparameter values. You can [tune hyperparameter values using Azure ML](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters), and manage their values in Azure DevOps Variable Groups.
* You may wish to enable [manual approvals](https://docs.microsoft.com/en-us/azure/devops/pipelines/process/approvals) before the deployment stages.
* You can install additional Conda or pip packages by modifying the YAML environment configurations under the `diabetes_regression` directory. Make sure to use fixed version numbers for all packages to ensure reproducibility, and use the same versions across environments.
* You can explore aspects of model observability in the solution, such as:
* **Logging**: navigate to the Application Insights instance linked to the Azure ML Portal,
then to the Logs (Analytics) pane. The following sample query correlates HTTP requests with custom logs
Expand Down
17 changes: 11 additions & 6 deletions environment_setup/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,16 @@ LABEL org.label-schema.vendor = "Microsoft" \
org.label-schema.url = "https://hub.docker.com/r/microsoft/mlopspython" \
org.label-schema.vcs-url = "https://github.com/microsoft/MLOpsPython"

COPY diabetes_regression/ci_dependencies.yml /setup/

COPY environment_setup/requirements.txt /setup/

RUN apt-get update && apt-get install gcc -y && pip install --upgrade -r /setup/requirements.txt && \
conda install -c r r-essentials
RUN conda env create -f /setup/ci_dependencies.yml

CMD ["python"]
# activate environment
ENV PATH /usr/local/envs/mlopspython_ci/bin:$PATH
RUN /bin/bash -c "source activate mlopspython_ci"

# Verify conda installation.
# This serves as workaround for https://github.com/conda/conda/issues/8537 (conda env create doesn't fail
# if pip installation fails, for example due to a wrong package version).
# The `az` command is not available if pip has not run (and installed azure-cli).
RUN az --version
6 changes: 4 additions & 2 deletions environment_setup/install_requirements.sh
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@
# ARISING IN ANY WAY OUT OF THE USE OF THE SOFTWARE CODE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.

set -eux

python --version
pip install -r requirements.txt
conda env create -f diabetes_regression/ci_dependencies.yml

conda activate mlopspython_ci
12 changes: 0 additions & 12 deletions environment_setup/requirements.txt

This file was deleted.

12 changes: 4 additions & 8 deletions ml_service/pipelines/diabetes_regression_build_train_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,10 @@ def main():
print("aml_compute:")
print(aml_compute)

run_config = RunConfiguration(conda_dependencies=CondaDependencies.create(
conda_packages=['numpy', 'pandas',
'scikit-learn', 'tensorflow', 'keras'],
pip_packages=['azure', 'azureml-core',
'azure-storage',
'azure-storage-blob',
'azureml-dataprep'])
)
# Create a run configuration environment
conda_deps_file = "diabetes_regression/training_dependencies.yml"
conda_deps = CondaDependencies(conda_deps_file)
run_config = RunConfiguration(conda_dependencies=conda_deps)
run_config.environment.docker.enabled = True
config_envvar = {}
if (e.collection_uri is not None and e.teamproject_name is not None):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,15 +26,11 @@ def main():
print("aml_compute:")
print(aml_compute)

run_config = RunConfiguration(conda_dependencies=CondaDependencies.create(
conda_packages=['numpy', 'pandas',
'scikit-learn', 'tensorflow', 'keras'],
pip_packages=['azure', 'azureml-core',
'azure-storage',
'azure-storage-blob'])
)
# Create a run configuration environment
conda_deps_file = "diabetes_regression/training_dependencies.yml"
conda_deps = CondaDependencies(conda_deps_file)
run_config = RunConfiguration(conda_dependencies=conda_deps)
run_config.environment.docker.enabled = True
run_config.environment.docker.base_image = "mcr.microsoft.com/mlops/python"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this container with r_essentails

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we had it essentially to demonstrate the use of the container for training 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added to the doc instead:

You will also need to add the
 `r-essentials` Conda packages into `diabetes_regression/scoring_dependencies.yml`
 and `diabetes_regression/training_dependencies.yml`.

I think it's a much more robust solution, and guides R users to the right process for adding the additional packages they will usually need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested, training seems to run fine:

Starting the daemon thread to refresh tokens in background for process with pid = 137
Entering Run History Context Manager.
[1] "R version 3.6.1 (2019-07-05)"
[1] "Reading file from weight_data.csv"
   height weight
1      79    174
2      63    250
3      75    223
4      75    130
5      70    120
6      76    239
7      63    129
8      64    185
9      59    246
10     80    241
11     79    217
12     65    212
13     74    242
14     71    223
15     61    167
16     78    148
17     75    229
18     75    116
19     75    182
20     72    237
21     72    160
22     79    169
23     67    219
24     61    202
25     65    168
26     79    181
27     81    214
28     78    216
29     59    245
       1        2 
173.6420 222.3347 

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept)            x  
   232.5858      -0.5126  

[1] "Completed"
-rwxrwxrwx 1 root root 1740 Jan 31 20:10 model.rds


The experiment completed successfully. Finalizing run...
Cleaning up all outstanding Run operations, waiting 300.0 seconds
1 items cleaning up...
Cleanup took 0.0007724761962890625 seconds
Starting the daemon thread to refresh tokens in background for process with pid = 137

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with you and that's what we showcased in python training pipeline and for R we wanted to demonstrate that one can bring in their base image for training as well :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to showcase it, I think it's better to do that in a doc than buried in a script


train_step = PythonScriptStep(
name="Train Model",
Expand Down
2 changes: 1 addition & 1 deletion tests/unit/code_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ def test_train_model():
run = Mock(Run)
reg = train_model(run, data, alpha=1.2)

run.log.assert_called_with("mse", 0.029843893480256872,
run.log.assert_called_with("mse", 0.029843893480257067,
sudivate marked this conversation as resolved.
Show resolved Hide resolved
description='Mean squared error metric')

preds = reg.predict([[1], [2]])
Expand Down