Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manage environments in conda YAML files #158

Merged
merged 18 commits into from
Jan 31, 2020
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ ENV/
env.bak/
venv.bak/
*.vscode
condaenv.*

# Spyder project settings
.spyderproject
Expand Down
28 changes: 28 additions & 0 deletions diabetes_regression/ci_dependencies.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
name: mlopspython_ci

dependencies:

# The python interpreter version.
- python=3.7.5

- r=3.6.0
- r-essentials=3.6.0
- numpy=1.18.1
- pandas=1.0.0
- scikit-learn=0.22.1

- pip=20.0.2
- pip:

# dependencies shared with other environment .yml files.
- azureml-sdk==1.0.79

# Additional pip dependencies for the CI environment.
- pytest==5.3.1
- pytest-cov==2.8.1
- requests==2.22.0
- python-dotenv==0.10.3
- flake8==3.7.9
- flake8_formatter_junit_xml==0.0.6
- azure-cli==2.0.77
- tox==3.14.3
4 changes: 2 additions & 2 deletions diabetes_regression/scoring/inference_config.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
entryScript: score.py
runtime: python
condaFile: conda_dependencies.yml
condaFile: ../scoring_dependencies.yml
extraDockerfileSteps:
schemaFile:
sourceDirectory:
enableGpu: False
baseImage:
baseImageRegistry:
baseImageRegistry:
Original file line number Diff line number Diff line change
Expand Up @@ -14,24 +14,23 @@
# This directive is stored in a comment to preserve the Conda file structure.
# [AzureMlVersion] = 2

name: project_environment
name: diabetes_scoring

dependencies:

# The python interpreter version.
# Currently Azure ML Workbench only supports 3.5.2 and later.
- python=3.7.5

# Required by azureml-defaults, installed separately through Conda to
# get a prebuilt version and not require build tools for the install.
- psutil=5.6 #latest

- numpy=1.18.1
- pandas=1.0.0
- scikit-learn=0.22.1

- pip=20.0.2
- pip:
# Required packages for AzureML execution, history, and data preparation.
- azureml-model-management-sdk==1.0.1b6.post1
- azureml-sdk==1.0.74
- scipy==1.3.1
- scikit-learn==0.22
- pandas==0.25.3
- numpy==1.17.3
- joblib==0.14.0
- gunicorn==19.9.0
- flask==1.1.1
- inference-schema[numpy-support]
# You must list azureml-defaults as a pip dependency
- azureml-defaults==1.0.85
- inference-schema[numpy-support]==1.0.1
18 changes: 18 additions & 0 deletions diabetes_regression/training_dependencies.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
name: diabetes_training

dependencies:

# The python interpreter version.
- python=3.7.5

- numpy=1.18.1
- pandas=1.0.0
- scikit-learn=0.22.1
#- r
#- r-essentials
#- tensorflow
#- keras

- pip=20.0.2
- pip:
- azureml-core==1.0.79
7 changes: 4 additions & 3 deletions docs/code_description.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

### Environment Setup

- `environment_setup/requirements.txt` : It consists of a list of python packages which are needed by the train.py to run successfully on host agent (locally).
- `environment_setup/ci_environment.yml` : Conda environment definition for the CI environment.
sudivate marked this conversation as resolved.
Show resolved Hide resolved

- `environment_setup/install_requirements.sh` : This script prepares the python environment i.e. install the Azure ML SDK and the packages specified in requirements.txt
- `environment_setup/install_requirements.sh` : This script prepares a local conda environment i.e. install the Azure ML SDK and the packages specified in environment definitions.

- `environment_setup/iac-*.yml, arm-templates` : Infrastructure as Code piplines to create and delete required resources along with corresponding arm-templates.

Expand Down Expand Up @@ -32,12 +32,13 @@
- `diabetes_regression/training/train.py` : a training step of an ML training pipeline.
- `diabetes_regression/evaluate/evaluate_model.py` : an evaluating step of an ML training pipeline which registers a new trained model if evaluation shows the new model is more performant than the previous one.
- `diabetes_regression/evaluate/register_model.py` : (LEGACY) registers a new trained model if evaluation shows the new model is more performant than the previous one.
- `diabetes_regression/training/training_dependencies.yml` : contains a list of dependencies required by train.py to be installed in a deployable Docker Image
sudivate marked this conversation as resolved.
Show resolved Hide resolved
- `diabetes_regression/training/R/r_train.r` : training a model with R basing on a sample dataset (weight_data.csv).
- `diabetes_regression/training/R/train_with_r.py` : a python wrapper (ML Pipeline Step) invoking R training script on ML Compute
- `diabetes_regression/training/R/train_with_r_on_databricks.py` : a python wrapper (ML Pipeline Step) invoking R training script on Databricks Compute
- `diabetes_regression/training/R/weight_data.csv` : a sample dataset used by R script (r_train.r) to train a model

### Scoring
- `diabetes_regression/scoring/score.py` : a scoring script which is about to be packed into a Docker Image along with a model while being deployed to QA/Prod environment.
- `diabetes_regression/scoring/conda_dependencies.yml` : contains a list of dependencies required by score.py to be installed in a deployable Docker Image
- `diabetes_regression/scoring/scoring_dependencies.yml` : contains a list of dependencies required by score.py to be installed in a deployable Docker Image
sudivate marked this conversation as resolved.
Show resolved Hide resolved
- `diabetes_regression/scoring/inference_config.yml`, deployment_config_aci.yml, deployment_config_aks.yml : configuration files for the [AML Model Deploy](https://marketplace.visualstudio.com/items?itemName=ms-air-aiagility.private-vss-services-azureml&ssr=false#overview) pipeline task for ACI and AKS deployment targets.
17 changes: 11 additions & 6 deletions environment_setup/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,16 @@ LABEL org.label-schema.vendor = "Microsoft" \
org.label-schema.url = "https://hub.docker.com/r/microsoft/mlopspython" \
org.label-schema.vcs-url = "https://github.com/microsoft/MLOpsPython"

COPY diabetes_regression/ci_dependencies.yml /setup/

COPY environment_setup/requirements.txt /setup/

RUN apt-get update && apt-get install gcc -y && pip install --upgrade -r /setup/requirements.txt && \
conda install -c r r-essentials
RUN conda env create -f /setup/ci_dependencies.yml

CMD ["python"]
# activate environment
ENV PATH /usr/local/envs/mlopspython_ci/bin:$PATH
RUN /bin/bash -c "source activate mlopspython_ci"

# Verify conda installation.
# This serves as workaround for https://github.com/conda/conda/issues/8537 (conda env create doesn't fail
# if pip installation fails, for example due to a wrong package version).
# The `az` command is not available if pip has not run (and installed azure-cli).
RUN az --version
6 changes: 4 additions & 2 deletions environment_setup/install_requirements.sh
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@
# ARISING IN ANY WAY OUT OF THE USE OF THE SOFTWARE CODE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.

set -eux

python --version
pip install -r requirements.txt
conda env create -f diabetes_regression/ci_dependencies.yml

conda activate mlopspython_ci
12 changes: 0 additions & 12 deletions environment_setup/requirements.txt

This file was deleted.

12 changes: 4 additions & 8 deletions ml_service/pipelines/diabetes_regression_build_train_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,10 @@ def main():
print("aml_compute:")
print(aml_compute)

run_config = RunConfiguration(conda_dependencies=CondaDependencies.create(
conda_packages=['numpy', 'pandas',
'scikit-learn', 'tensorflow', 'keras'],
pip_packages=['azure', 'azureml-core',
'azure-storage',
'azure-storage-blob',
'azureml-dataprep'])
)
# Create a run configuration environment
conda_deps_file = "diabetes_regression/training_dependencies.yml"
conda_deps = CondaDependencies(conda_deps_file)
run_config = RunConfiguration(conda_dependencies=conda_deps)
run_config.environment.docker.enabled = True
config_envvar = {}
if (e.collection_uri is not None and e.teamproject_name is not None):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,15 +26,11 @@ def main():
print("aml_compute:")
print(aml_compute)

run_config = RunConfiguration(conda_dependencies=CondaDependencies.create(
conda_packages=['numpy', 'pandas',
'scikit-learn', 'tensorflow', 'keras'],
pip_packages=['azure', 'azureml-core',
'azure-storage',
'azure-storage-blob'])
)
# Create a run configuration environment
conda_deps_file = "diabetes_regression/training_dependencies.yml"
conda_deps = CondaDependencies(conda_deps_file)
run_config = RunConfiguration(conda_dependencies=conda_deps)
run_config.environment.docker.enabled = True
run_config.environment.docker.base_image = "mcr.microsoft.com/mlops/python"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this container with r_essentails

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we had it essentially to demonstrate the use of the container for training 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added to the doc instead:

You will also need to add the
 `r-essentials` Conda packages into `diabetes_regression/scoring_dependencies.yml`
 and `diabetes_regression/training_dependencies.yml`.

I think it's a much more robust solution, and guides R users to the right process for adding the additional packages they will usually need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested, training seems to run fine:

Starting the daemon thread to refresh tokens in background for process with pid = 137
Entering Run History Context Manager.
[1] "R version 3.6.1 (2019-07-05)"
[1] "Reading file from weight_data.csv"
   height weight
1      79    174
2      63    250
3      75    223
4      75    130
5      70    120
6      76    239
7      63    129
8      64    185
9      59    246
10     80    241
11     79    217
12     65    212
13     74    242
14     71    223
15     61    167
16     78    148
17     75    229
18     75    116
19     75    182
20     72    237
21     72    160
22     79    169
23     67    219
24     61    202
25     65    168
26     79    181
27     81    214
28     78    216
29     59    245
       1        2 
173.6420 222.3347 

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept)            x  
   232.5858      -0.5126  

[1] "Completed"
-rwxrwxrwx 1 root root 1740 Jan 31 20:10 model.rds


The experiment completed successfully. Finalizing run...
Cleaning up all outstanding Run operations, waiting 300.0 seconds
1 items cleaning up...
Cleanup took 0.0007724761962890625 seconds
Starting the daemon thread to refresh tokens in background for process with pid = 137

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with you and that's what we showcased in python training pipeline and for R we wanted to demonstrate that one can bring in their base image for training as well :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to showcase it, I think it's better to do that in a doc than buried in a script


train_step = PythonScriptStep(
name="Train Model",
Expand Down
2 changes: 1 addition & 1 deletion tests/unit/code_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ def test_train_model():
run = Mock(Run)
reg = train_model(run, data, alpha=1.2)

run.log.assert_called_with("mse", 0.029843893480256872,
run.log.assert_called_with("mse", 0.029843893480257067,
sudivate marked this conversation as resolved.
Show resolved Hide resolved
description='Mean squared error metric')

preds = reg.predict([[1], [2]])
Expand Down