Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manage AML environments from conda YAML files #168

Merged
merged 38 commits into from
Feb 4, 2020

Conversation

algattik
Copy link
Contributor

@algattik algattik commented Jan 31, 2020

Builds on top of PR #158

Overview:

  • Automatically manage Azure ML environments for training and scoring, in a manner robust to changes in Conda YAML files, as well as concurrent branches.
  • Replaced the AMLDeploy task with Python SDK service deployment.

Detailed description:

Leverage Azure ML environments in order to avoid re-creating container images for the training and scoring containers (which are usually identical from one run to the next).

For DevOps scenarios, it's important to have reproducible outcomes, so that environments should be automatically updated whenever the Conda YAML file is modified. It's also important that the pipeline can run in multiple branches in parallel without interference. To enable that, this function automatically creates an environment name from a base name with a checksum of the Conda YAML file appended. If that environment already exists, it is retrieved, otherwise it is created from the Conda file.

As the environment name is now variable, the inference config is now dynamic. While we could have generated an inference config JSON file to feed the AMLDeploy task, it was simpler to just deploy the service using the Python SDK instead. This has the side benefit of not requiring the installation of the Az CLI ML extension (77 MB as it packs its own python libraries including AML SDK). (The AMLDeploy checks for the presence of the Az CLI ML and installs it if absent, which causes a slight delay. The AMLDeploy task is just a wrapper for the Az CLI ML task, its only potential benefit is in graphical DevOps pipelines).

@algattik algattik marked this pull request as ready for review January 31, 2020 18:40
# Create a run configuration environment
conda_deps_file = "diabetes_regression/training_dependencies.yml"
conda_deps = CondaDependencies(conda_deps_file)
run_config = RunConfiguration(conda_dependencies=conda_deps)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leverage the environments similar to diabetes_regression_build_train_pipeline.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@sudivate sudivate requested review from eedorenko and dtzar January 31, 2020 19:11
@@ -1,16 +0,0 @@
computeType: AKS
autoScaler:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this AKS config deleted ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Environment definition is moved to Python script.
AML Python SDK does not provide a way to define an environment in a YAML file, only in code (I have checked - the YAML parsing functionality is implemented directly in the AZ CLI ML extension - would be a good thing to propose to add in the SDK!)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change this to CLI and config files instead of SDK to deploy model. We want to demonstrate the ease of CLI to deploy models

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would still not have YML files anymore, and need an SDK script to generate the environment, and put the name in a json. Then we would be able to call the CLI. We would essentially remove a line from the existing script, but then have a delay while the extension gets installed every time for every build. Having a program generate input for a CLI would not seem to be a meaningful option to users.

Ideally the service itself should support such declarative environment versioning, rather than the imperative (additive) version behavior available today which is hard to use with DevOps.

docs/code_description.md Outdated Show resolved Hide resolved
docs/code_description.md Outdated Show resolved Hide resolved
docs/code_description.md Outdated Show resolved Hide resolved
Copy link
Member

@dtzar dtzar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the lint issues causing the build to fail and then LGTM assuming it passes a test CI deploy

@algattik
Copy link
Contributor Author

Fix the lint issues causing the build to fail and then LGTM assuming it passes a test CI deploy

Linting & UT fixed.
Passes PR build, as well as a test CI/CD deploy https://dev.azure.com/algattik/MLOpsPython/_build/results?buildId=338&view=results

@eedorenko
Copy link
Contributor

IMO it's overcomplicated.

  1. Instead of cleaning/simplifying the repo we are getting it more complex. The most common complain on the repo that I I've heard is "it's very complicated". This auto-generation of environment name is too much for the purpose of this repo.
  2. We are demonstrating/forcing people to use only one (not the best) deployment approach with AML SDK, completely hiding the most common scenarios with deployment tasks and CLI.

Copy link
Contributor

@eedorenko eedorenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment

Copy link
Member

@dtzar dtzar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon a closer look I agree changing up the deployment aspects makes things more complicated. Furthermore it gets away from the title of the PR "Manage AML environments from conda YAML files". I suggest you remove the deployment modifications for now and just stick to environments for PR. Let's work together on a plan for desired deployment modifications.

@@ -0,0 +1,39 @@
{
"name": "diabetes_regression_sklearn",
"version": null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this file used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Environment.load_from_directory(). Added comment to script for clarity fe35f83

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

azureml_environment.json is not pointing to any conda specification file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the file as generated by az ml environment scaffold.

I assume your concern is about the null values in this section of the file:

    "python": {
        "userManagedDependencies": false,
        "interpreterPath": "python",
        "condaDependenciesFile": null,
        "baseCondaEnvironment": null
    },

If you look at the AML SDK source code, load_from_directory() actually fills the conda_dependencies from a file named conda_dependencies.yml is one exists in the passed directory:

    def load_from_directory(path):
        """Load an environment definition from the files in a directory.

        :param path: Path to the source directory.
        :type path: str
        """
        definition_path = os.path.join(path, _DEFINITION_FILE_NAME)
        if not os.path.isfile(definition_path):
            raise FileNotFoundError(definition_path)

        with open(definition_path, "r") as definition:
            environment_dict = json.load(definition)
        env = Environment._deserialize_and_add_to_object(environment_dict)

        conda_file_path = os.path.join(path, _CONDA_DEPENDENCIES_FILE_NAME)
        if os.path.isfile(conda_file_path):
            env.python = env.python or PythonSection(_skip_defaults=True)
            env.python.conda_dependencies = CondaDependencies(conda_file_path)

        base_dockerfile_path = os.path.join(path, _BASE_DOCKERFILE_FILE_NAME)
        if os.path.isfile(base_dockerfile_path):
            with open(base_dockerfile_path, "r") as base_dockerfile:
                env.docker = env.docker or DockerSection(_skip_defaults=True)
                env.docker.base_dockerfile = base_dockerfile.read()

I've tested that things works as expected, as when I uncomment r-essentials from the conda file and run ml_service/pipelines/diabetes_regression_build_train_pipeline_with_r.py, training succeeds as shown by R output in the logs.

@sudivate sudivate self-requested a review February 4, 2020 19:35
@sudivate sudivate merged commit 8174d98 into microsoft:master Feb 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants