diff --git a/README.md b/README.md
index c3a70d61201..91bb93ea407 100644
--- a/README.md
+++ b/README.md
@@ -89,7 +89,7 @@
Meet the Team
- 🎉 Version 0.46.0 is out. Check out the release notes
+ 🎉 Version 0.46.1 is out. Check out the release notes
here.
diff --git a/RELEASE_NOTES.md b/RELEASE_NOTES.md
index 28a2043a94a..d641f0d4ceb 100644
--- a/RELEASE_NOTES.md
+++ b/RELEASE_NOTES.md
@@ -1,4 +1,38 @@
+# 0.46.1
+
+The 0.46.1 release introduces support for Service Accounts and API Keys that
+can be used to authenticate with the ZenML server from environments that do not
+support the web login flow, such as CI/CD environments, for example.
+
+Also included in this release are some documentation updates and bug fixes,
+notably moving the database migration logic deployed with the Helm chart out of
+the init containers and into a Kubernetes Job, which makes it possible to scale
+out the ZenML server deployments without the risk of running into database
+migration conflicts.
+
+## What's Changed
+* Small improvements to Hub docs page by @strickvl in https://github.com/zenml-io/zenml/pull/2015
+* Pin OpenAI integration to `<1.0.0` by @strickvl in https://github.com/zenml-io/zenml/pull/2027
+* Make error message nicer for when two artifacts that share a prefix are found by @strickvl in https://github.com/zenml-io/zenml/pull/2023
+* Move db-migration to `job` instead of `init-container` to allow replicas by @safoinme in https://github.com/zenml-io/zenml/pull/2021
+* Fix stuck/broken CI by @strickvl in https://github.com/zenml-io/zenml/pull/2032
+* Increase `step.source_code` Cut-Off Limit by @fa9r in https://github.com/zenml-io/zenml/pull/2025
+* Improve artifact linkage logging in MCP by @avishniakov in https://github.com/zenml-io/zenml/pull/2016
+* Upgrade feast so apidocs don't fail no mo by @AlexejPenner in https://github.com/zenml-io/zenml/pull/2028
+* Remove NumPy Visualizations for 2D Arrays by @fa9r in https://github.com/zenml-io/zenml/pull/2033
+* Fix user activation bug by @stefannica in https://github.com/zenml-io/zenml/pull/2037
+* Remove `create_new_model_version` arg of `ModelConfig` by @avishniakov in https://github.com/zenml-io/zenml/pull/2030
+* Extend the wait period in between PyPi package publication and Docker image building for releases by @strickvl in https://github.com/zenml-io/zenml/pull/2029
+* Make `zenml up` prefill username when launching dashboard by @strickvl in https://github.com/zenml-io/zenml/pull/2024
+* Add warning when artifact store cannot be loaded by @strickvl in https://github.com/zenml-io/zenml/pull/2011
+* Add extra config to `Kaniko` docs by @safoinme in https://github.com/zenml-io/zenml/pull/2019
+* ZenML API Keys and Service Accounts by @stefannica in https://github.com/zenml-io/zenml/pull/1840
+
+
+**Full Changelog**: https://github.com/zenml-io/zenml/compare/0.46.0..0.46.1
+
+
# 0.46.0
This release brings some upgrades, documentation updates and bug fixes. Notably,
diff --git a/examples/e2e/.assets/00_pipelines_composition.png b/examples/e2e/.assets/00_pipelines_composition.png
index 3de10920306..a01f0679270 100644
Binary files a/examples/e2e/.assets/00_pipelines_composition.png and b/examples/e2e/.assets/00_pipelines_composition.png differ
diff --git a/examples/e2e/README.md b/examples/e2e/README.md
index 2238038cce0..79c20af57ff 100644
--- a/examples/e2e/README.md
+++ b/examples/e2e/README.md
@@ -81,12 +81,12 @@ This template uses
to demonstrate how to perform major critical steps for Continuous Training (CT)
and Continuous Delivery (CD).
-It consists of two pipelines with the following high-level setup:
+It consists of three pipelines with the following high-level setup:
- +
-Both pipelines are inside a shared Model Control Plane model context - training pipeline creates and promotes new Model Control Plane version and inference pipeline is reading from inference Model Control Plane version. This makes those pipelines closely connected, while ensuring that only quality-assured Model Control Plane versions are used to produce predictions delivered to stakeholders. +All pipelines are leveraging the Model Control Plane to bring all parts together - the training pipeline creates and promotes a new Model Control Plane version with a trained model object in it, deployment pipeline uses the inference Model Control Plane version (the one promoted during training) to create a deployment service and inference pipeline using deployment service from the inference Model Control Plane version and store back new set of predictions as a versioned data artifact for future use. This makes those pipelines closely connected while ensuring that only quality-assured Model Control Plane versions are used to produce predictions delivered to stakeholders. * [CT] Training * Load, split, and preprocess the training dataset * Search for an optimal model object architecture and tune its hyperparameters @@ -94,6 +94,8 @@ Both pipelines are inside a shared Model Control Plane model context - training * Compare a recently trained model object with one promoted earlier * If a recently trained model object performs better - stage it as a new inference model object in model registry * On success of the current model object - stage newly created Model Control Plane version as the one used for inference +* [CD] Deployment + * Deploy a new prediction service based on the model object connected to the inference Model Control Plane version. * [CD] Batch Inference * Load the inference dataset and preprocess it reusing object fitted during training * Perform data drift analysis reusing training dataset of the inference Model Control Plane version as a reference @@ -119,23 +121,27 @@ The project loosely follows [the recommended ZenML project structure](https://do ``` . -├── pipelines # `zenml.pipeline` implementations -│ ├── batch_inference.py # [CD] Batch Inference pipeline -│ └── training.py # [CT] Training Pipeline -├── steps # logically grouped `zenml.steps` implementations -│ ├── alerts # alert developer on pipeline status -│ ├── data_quality # quality gates built on top of drift report -│ ├── etl # ETL logic for dataset -│ ├── hp_tuning # tune hyperparameters and model architectures -│ ├── inference # inference on top of the model from the registry -│ ├── promotion # find if a newly trained model will be new inference -│ └── training # train and evaluate model -├── utils # helper functions +├── configs # pipelines configuration files +│ ├── deployer_config.yaml # the configuration of the deployment pipeline +│ ├── inference_config.yaml # the configuration of the batch inference pipeline +│ └── train_config.yaml # the configuration of the training pipeline +├── pipelines # `zenml.pipeline` implementations +│ ├── batch_inference.py # [CD] Batch Inference pipeline +│ ├── deployment.py # [CD] Deployment pipeline +│ └── training.py # [CT] Training Pipeline +├── steps # logically grouped `zenml.steps` implementations +│ ├── alerts # alert developer on pipeline status +│ ├── deployment # deploy trained model objects +│ ├── data_quality # quality gates built on top of drift report +│ ├── etl # ETL logic for dataset +│ ├── hp_tuning # tune hyperparameters and model architectures +│ ├── inference # inference on top of the model from the registry +│ ├── promotion # find if a newly trained model will be new inference +│ └── training # train and evaluate model +├── utils # helper functions ├── .dockerignore -├── inference_config.yaml # the configuration of the batch inference pipeline -├── Makefile # helper scripts for quick start with integrations -├── README.md # this file -├── requirements.txt # extra Python dependencies -├── run.py # CLI tool to run pipelines on ZenML Stack -└── train_config.yaml # the configuration of the training pipeline +├── Makefile # helper scripts for quick start with integrations +├── README.md # this file +├── requirements.txt # extra Python dependencies +└── run.py # CLI tool to run pipelines on ZenML Stack ``` diff --git a/examples/e2e/configs/deployer_config.yaml b/examples/e2e/configs/deployer_config.yaml new file mode 100644 index 00000000000..d32a25cc4c1 --- /dev/null +++ b/examples/e2e/configs/deployer_config.yaml @@ -0,0 +1,44 @@ +# Apache Software License 2.0 +# +# Copyright (c) ZenML GmbH 2023. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# environment configuration +settings: + docker: + required_integrations: + - aws + - evidently + - kubeflow + - kubernetes + - mlflow + - sklearn + - slack + +# configuration of steps +steps: + notify_on_success: + parameters: + notify_on_success: False + +# configuration of the Model Control Plane +model_config: + name: e2e_use_case + version: staging + +# pipeline level extra configurations +extra: + notify_on_failure: True + diff --git a/examples/e2e/configs/inference_config.yaml b/examples/e2e/configs/inference_config.yaml new file mode 100644 index 00000000000..d32a25cc4c1 --- /dev/null +++ b/examples/e2e/configs/inference_config.yaml @@ -0,0 +1,44 @@ +# Apache Software License 2.0 +# +# Copyright (c) ZenML GmbH 2023. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# environment configuration +settings: + docker: + required_integrations: + - aws + - evidently + - kubeflow + - kubernetes + - mlflow + - sklearn + - slack + +# configuration of steps +steps: + notify_on_success: + parameters: + notify_on_success: False + +# configuration of the Model Control Plane +model_config: + name: e2e_use_case + version: staging + +# pipeline level extra configurations +extra: + notify_on_failure: True + diff --git a/examples/e2e/configs/train_config.yaml b/examples/e2e/configs/train_config.yaml new file mode 100644 index 00000000000..b1cb5b70931 --- /dev/null +++ b/examples/e2e/configs/train_config.yaml @@ -0,0 +1,112 @@ +# Apache Software License 2.0 +# +# Copyright (c) ZenML GmbH 2023. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# environment configuration +settings: + docker: + required_integrations: + - aws + - evidently + - kubeflow + - kubernetes + - mlflow + - sklearn + - slack + +# configuration of steps +steps: + model_trainer: + parameters: + name: e2e_use_case + compute_performance_metrics_on_current_data: + parameters: + target_env: staging + promote_with_metric_compare: + parameters: + mlflow_model_name: e2e_use_case + target_env: staging + notify_on_success: + parameters: + notify_on_success: False + +# configuration of the Model Control Plane +model_config: + name: e2e_use_case + license: apache + description: e2e_use_case E2E Batch Use Case + audience: All ZenML users + use_cases: | + The ZenML E2E project project demonstrates how the most important steps of + the ML Production Lifecycle can be implemented in a reusable way remaining + agnostic to the underlying infrastructure, and shows how to integrate them together + into pipelines for Training and Batch Inference purposes. + ethics: No impact. + tags: + - e2e + - batch + - sklearn + - from template + - ZenML delivered + create_new_model_version: true + +# pipeline level extra configurations +extra: + notify_on_failure: True + # This set contains all the model configurations that you want + # to evaluate during hyperparameter tuning stage. + model_search_space: + random_forest: + model_package: sklearn.ensemble + model_class: RandomForestClassifier + search_grid: + criterion: + - gini + - entropy + max_depth: + - 2 + - 4 + - 6 + - 8 + - 10 + - 12 + min_samples_leaf: + range: + start: 1 + end: 10 + n_estimators: + range: + start: 50 + end: 500 + step: 25 + decision_tree: + model_package: sklearn.tree + model_class: DecisionTreeClassifier + search_grid: + criterion: + - gini + - entropy + max_depth: + - 2 + - 4 + - 6 + - 8 + - 10 + - 12 + min_samples_leaf: + range: + start: 1 + end: 10 \ No newline at end of file diff --git a/examples/e2e/pipelines/__init__.py b/examples/e2e/pipelines/__init__.py index f0a6c70c2b7..1dc5144563c 100644 --- a/examples/e2e/pipelines/__init__.py +++ b/examples/e2e/pipelines/__init__.py @@ -18,3 +18,4 @@ from .batch_inference import e2e_use_case_batch_inference from .training import e2e_use_case_training +from .deployment import e2e_use_case_deployment diff --git a/examples/e2e/pipelines/batch_inference.py b/examples/e2e/pipelines/batch_inference.py index 9a6fff8daff..ac93ae9a182 100644 --- a/examples/e2e/pipelines/batch_inference.py +++ b/examples/e2e/pipelines/batch_inference.py @@ -15,7 +15,6 @@ # limitations under the License. # - from steps import ( data_loader, drift_quality_gate, @@ -25,13 +24,10 @@ notify_on_success, ) -from zenml import get_pipeline_context, pipeline +from zenml import pipeline from zenml.artifacts.external_artifact import ExternalArtifact from zenml.integrations.evidently.metrics import EvidentlyMetricConfig from zenml.integrations.evidently.steps import evidently_report_step -from zenml.integrations.mlflow.steps.mlflow_deployer import ( - mlflow_model_registry_deployer_step, -) from zenml.logger import get_logger logger = get_logger(__name__) @@ -49,7 +45,13 @@ def e2e_use_case_batch_inference(): # Link all the steps together by calling them and passing the output # of one step as the input of the next step. ########## ETL stage ########## - df_inference, target = data_loader(is_inference=True) + df_inference, target, _ = data_loader( + random_state=ExternalArtifact( + model_artifact_pipeline_name="e2e_use_case_training", + model_artifact_name="random_state", + ), + is_inference=True, + ) df_inference = inference_data_preprocessor( dataset_inf=df_inference, preprocess_pipeline=ExternalArtifact( @@ -70,15 +72,7 @@ def e2e_use_case_batch_inference(): ) drift_quality_gate(report) ########## Inference stage ########## - deployment_service = mlflow_model_registry_deployer_step( - registry_model_name=get_pipeline_context().extra["mlflow_model_name"], - registry_model_version=ExternalArtifact( - model_artifact_name="promoted_version", - ), - replace_existing=True, - ) inference_predict( - deployment_service=deployment_service, dataset_inf=df_inference, after=["drift_quality_gate"], ) diff --git a/examples/e2e/pipelines/deployment.py b/examples/e2e/pipelines/deployment.py new file mode 100644 index 00000000000..0d55a06d20b --- /dev/null +++ b/examples/e2e/pipelines/deployment.py @@ -0,0 +1,37 @@ +# Apache Software License 2.0 +# +# Copyright (c) ZenML GmbH 2023. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from steps import deployment_deploy, notify_on_failure, notify_on_success + +from zenml import pipeline + + +@pipeline(on_failure=notify_on_failure) +def e2e_use_case_deployment(): + """ + Model deployment pipeline. + + This is a pipeline deploys trained model for future inference. + """ + ### ADD YOUR OWN CODE HERE - THIS IS JUST AN EXAMPLE ### + # Link all the steps together by calling them and passing the output + # of one step as the input of the next step. + ########## Deployment stage ########## + deployment_deploy() + + notify_on_success(after=["deployment_deploy"]) + ### YOUR CODE ENDS HERE ### diff --git a/examples/e2e/pipelines/training.py b/examples/e2e/pipelines/training.py index 69b46cb8f45..5f7b88a7ca6 100644 --- a/examples/e2e/pipelines/training.py +++ b/examples/e2e/pipelines/training.py @@ -16,9 +16,11 @@ # +import random from typing import List, Optional from steps import ( + compute_performance_metrics_on_current_data, data_loader, hp_tuning_select_best_model, hp_tuning_single_search, @@ -26,21 +28,12 @@ model_trainer, notify_on_failure, notify_on_success, - promote_get_metric, - promote_get_versions, - promote_metric_compare_promoter_in_model_registry, - promote_model_version_in_model_control_plane, + promote_with_metric_compare, train_data_preprocessor, train_data_splitter, ) from zenml import get_pipeline_context, pipeline -from zenml.integrations.mlflow.steps.mlflow_deployer import ( - mlflow_model_registry_deployer_step, -) -from zenml.integrations.mlflow.steps.mlflow_registry import ( - mlflow_register_model_step, -) from zenml.logger import get_logger logger = get_logger(__name__) @@ -79,7 +72,7 @@ def e2e_use_case_training( # of one step as the input of the next step. pipeline_extra = get_pipeline_context().extra ########## ETL stage ########## - raw_data, target = data_loader() + raw_data, target, _ = data_loader(random_state=random.randint(0, 100)) dataset_trn, dataset_tst = train_data_splitter( dataset=raw_data, test_size=test_size, @@ -108,9 +101,7 @@ def e2e_use_case_training( target=target, ) after.append(step_name) - best_model = hp_tuning_select_best_model( - search_steps_prefix=search_steps_prefix, after=after - ) + best_model = hp_tuning_select_best_model(after=after) ########## Training stage ########## model = model_trainer( @@ -127,50 +118,19 @@ def e2e_use_case_training( fail_on_accuracy_quality_gates=fail_on_accuracy_quality_gates, target=target, ) - mlflow_register_model_step( - model, - name=pipeline_extra["mlflow_model_name"], - ) - ########## Promotion stage ########## - latest_version, current_version = promote_get_versions( - after=["mlflow_register_model_step"], - ) - latest_deployment = mlflow_model_registry_deployer_step( - id="deploy_latest_model_version", - registry_model_name=pipeline_extra["mlflow_model_name"], - registry_model_version=latest_version, - replace_existing=True, - ) - latest_metric = promote_get_metric( - id="get_metrics_latest_model_version", - dataset_tst=dataset_tst, - deployment_service=latest_deployment, - ) - - current_deployment = mlflow_model_registry_deployer_step( - id="deploy_current_model_version", - registry_model_name=pipeline_extra["mlflow_model_name"], - registry_model_version=current_version, - replace_existing=True, - after=["get_metrics_latest_model_version"], - ) - current_metric = promote_get_metric( - id="get_metrics_current_model_version", - dataset_tst=dataset_tst, - deployment_service=current_deployment, + ( + latest_metric, + current_metric, + ) = compute_performance_metrics_on_current_data( + dataset_tst=dataset_tst, after=["model_evaluator"] ) - ( - was_promoted, - promoted_version, - ) = promote_metric_compare_promoter_in_model_registry( + promote_with_metric_compare( latest_metric=latest_metric, current_metric=current_metric, - latest_version=latest_version, - current_version=current_version, ) - promote_model_version_in_model_control_plane(was_promoted) + last_step = "promote_with_metric_compare" - notify_on_success(after=["promote_model_version_in_model_control_plane"]) + notify_on_success(after=[last_step]) ### YOUR CODE ENDS HERE ### diff --git a/examples/e2e/run.py b/examples/e2e/run.py index 92fbfcc2521..02d22a3c4fa 100644 --- a/examples/e2e/run.py +++ b/examples/e2e/run.py @@ -20,7 +20,11 @@ from typing import Optional import click -from pipelines import e2e_use_case_batch_inference, e2e_use_case_training +from pipelines import ( + e2e_use_case_batch_inference, + e2e_use_case_deployment, + e2e_use_case_training, +) from zenml.artifacts.external_artifact import ExternalArtifact from zenml.logger import get_logger @@ -174,6 +178,7 @@ def main( pipeline_args["config_path"] = os.path.join( os.path.dirname(os.path.realpath(__file__)), + "configs", "train_config.yaml", ) pipeline_args[ @@ -182,10 +187,23 @@ def main( e2e_use_case_training.with_options(**pipeline_args)(**run_args_train) logger.info("Training pipeline finished successfully!") + # Execute Deployment Pipeline + run_args_inference = {} + pipeline_args["config_path"] = os.path.join( + os.path.dirname(os.path.realpath(__file__)), + "configs", + "deployer_config.yaml", + ) + pipeline_args[ + "run_name" + ] = f"e2e_use_case_deployment_run_{dt.now().strftime('%Y_%m_%d_%H_%M_%S')}" + e2e_use_case_deployment.with_options(**pipeline_args)(**run_args_inference) + # Execute Batch Inference Pipeline run_args_inference = {} pipeline_args["config_path"] = os.path.join( os.path.dirname(os.path.realpath(__file__)), + "configs", "inference_config.yaml", ) pipeline_args[ diff --git a/examples/e2e/steps/__init__.py b/examples/e2e/steps/__init__.py index a4775089b72..2a05a074ebc 100644 --- a/examples/e2e/steps/__init__.py +++ b/examples/e2e/steps/__init__.py @@ -27,9 +27,8 @@ from .hp_tuning import hp_tuning_select_best_model, hp_tuning_single_search from .inference import inference_predict from .promotion import ( - promote_get_metric, - promote_metric_compare_promoter_in_model_registry, - promote_get_versions, - promote_model_version_in_model_control_plane, + compute_performance_metrics_on_current_data, + promote_with_metric_compare, ) from .training import model_evaluator, model_trainer +from .deployment import deployment_deploy diff --git a/examples/e2e/steps/alerts/notify_on.py b/examples/e2e/steps/alerts/notify_on.py index d873f9aee2d..15593ed34ef 100644 --- a/examples/e2e/steps/alerts/notify_on.py +++ b/examples/e2e/steps/alerts/notify_on.py @@ -50,8 +50,7 @@ def notify_on_failure() -> None: @step(enable_cache=False) -def notify_on_success() -> None: +def notify_on_success(notify_on_success: bool) -> None: """Notifies user on pipeline success.""" - step_context = get_step_context() - if alerter and step_context.pipeline_run.config.extra["notify_on_success"]: + if alerter and notify_on_success: alerter.post(message=build_message(status="succeeded")) diff --git a/examples/e2e/steps/deployment/__init__.py b/examples/e2e/steps/deployment/__init__.py new file mode 100644 index 00000000000..8e56754ab43 --- /dev/null +++ b/examples/e2e/steps/deployment/__init__.py @@ -0,0 +1,19 @@ +# Apache Software License 2.0 +# +# Copyright (c) ZenML GmbH 2023. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + + +from .deployment_deploy import deployment_deploy diff --git a/examples/e2e/steps/deployment/deployment_deploy.py b/examples/e2e/steps/deployment/deployment_deploy.py new file mode 100644 index 00000000000..c7d570c9f99 --- /dev/null +++ b/examples/e2e/steps/deployment/deployment_deploy.py @@ -0,0 +1,78 @@ +# Apache Software License 2.0 +# +# Copyright (c) ZenML GmbH 2023. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + + +from typing import Optional + +from typing_extensions import Annotated +from utils import get_model_registry_version + +from zenml import get_step_context, step +from zenml.client import Client +from zenml.integrations.mlflow.services.mlflow_deployment import ( + MLFlowDeploymentService, +) +from zenml.integrations.mlflow.steps.mlflow_deployer import ( + mlflow_model_registry_deployer_step, +) +from zenml.logger import get_logger +from zenml.model import DeploymentArtifactConfig + +logger = get_logger(__name__) + + +@step +def deployment_deploy() -> ( + Annotated[ + Optional[MLFlowDeploymentService], + "mlflow_deployment", + DeploymentArtifactConfig(), + ] +): + """Predictions step. + + This is an example of a predictions step that takes the data in and returns + predicted values. + + This step is parameterized, which allows you to configure the step + independently of the step code, before running it in a pipeline. + In this example, the step can be configured to use different input data. + See the documentation for more information: + + https://docs.zenml.io/user-guide/advanced-guide/configure-steps-pipelines + + Args: + dataset_inf: The inference dataset. + + Returns: + The predictions as pandas series + """ + ### ADD YOUR OWN CODE HERE - THIS IS JUST AN EXAMPLE ### + if Client().active_stack.orchestrator.flavor == "local": + model_version = get_step_context().model_config._get_model_version() + + # deploy predictor service + deployment_service = mlflow_model_registry_deployer_step.entrypoint( + registry_model_name=model_version.model.name, + registry_model_version=get_model_registry_version(model_version), + replace_existing=True, + ) + else: + logger.warning("Skipping deployment as the orchestrator is not local.") + deployment_service = None + ### YOUR CODE ENDS HERE ### + return deployment_service diff --git a/examples/e2e/steps/etl/data_loader.py b/examples/e2e/steps/etl/data_loader.py index 731523ed738..4e2e0100044 100644 --- a/examples/e2e/steps/etl/data_loader.py +++ b/examples/e2e/steps/etl/data_loader.py @@ -23,18 +23,19 @@ from typing_extensions import Annotated from zenml import step -from zenml.client import Client from zenml.logger import get_logger logger = get_logger(__name__) -artifact_store = Client().active_stack.artifact_store - @step def data_loader( - is_inference: bool = False, -) -> Tuple[Annotated[pd.DataFrame, "dataset"], Annotated[str, "target"],]: + random_state: int, is_inference: bool = False +) -> Tuple[ + Annotated[pd.DataFrame, "dataset"], + Annotated[str, "target"], + Annotated[int, "random_state"], +]: """Dataset reader step. This is an example of a dataset reader step that load Breast Cancer dataset. @@ -49,6 +50,7 @@ def data_loader( Args: is_inference: If `True` subset will be returned and target column will be removed from dataset. + random_state: Random state for sampling Returns: The dataset artifact as Pandas DataFrame and name of target column. @@ -58,7 +60,9 @@ def data_loader( inference_size = int(len(dataset.target) * 0.05) target = "target" dataset: pd.DataFrame = dataset.frame - inference_subset = dataset.sample(inference_size, random_state=42) + inference_subset = dataset.sample( + inference_size, random_state=random_state + ) if is_inference: dataset = inference_subset dataset.drop(columns=target, inplace=True) @@ -67,4 +71,4 @@ def data_loader( dataset.reset_index(drop=True, inplace=True) logger.info(f"Dataset with {len(dataset)} records loaded!") ### YOUR CODE ENDS HERE ### - return dataset, target + return dataset, target, random_state diff --git a/examples/e2e/steps/hp_tuning/hp_tuning_select_best_model.py b/examples/e2e/steps/hp_tuning/hp_tuning_select_best_model.py index f79e254c001..62c3edc011a 100644 --- a/examples/e2e/steps/hp_tuning/hp_tuning_select_best_model.py +++ b/examples/e2e/steps/hp_tuning/hp_tuning_select_best_model.py @@ -20,40 +20,36 @@ from typing_extensions import Annotated from zenml import get_step_context, step -from zenml.client import Client from zenml.logger import get_logger logger = get_logger(__name__) @step -def hp_tuning_select_best_model( - search_steps_prefix: str, -) -> Annotated[ClassifierMixin, "best_model"]: +def hp_tuning_select_best_model() -> Annotated[ClassifierMixin, "best_model"]: """Find best model across all HP tuning attempts. - This is an example of a model hyperparameter tuning step that takes - in prefix of steps called previously to search for best hyperparameters. - It will loop other them and find best model of all according to metric. - - Args: - search_steps_prefix: Prefix of steps used for grid search before. + This is an example of a model hyperparameter tuning step that loops + other artifacts linked to model version in Model Control Plane to find + the best hyperparameter tuning output model of all according to the metric. Returns: The best possible model class and its' parameters. """ ### ADD YOUR OWN CODE HERE - THIS IS JUST AN EXAMPLE ### - run_name = get_step_context().pipeline_run.name - run = Client().get_pipeline_run(run_name) + model_version = get_step_context().model_config._get_model_version() best_model = None best_metric = -1 - for run_step_name, run_step in run.steps.items(): - if run_step_name.startswith(search_steps_prefix): - if "best_model" in run_step.outputs: - model: ClassifierMixin = run_step.outputs["best_model"].load() - metric: float = run_step.outputs["metric"].load() - if best_model is None or best_metric < metric: - best_model = model + # consume artifacts attached to current model version in Model Control Plane + for full_artifact_name in model_version.artifact_object_ids: + # if artifacts comes from one of HP tuning steps + if full_artifact_name.endswith("hp_result"): + hp_output = model_version.artifacts[full_artifact_name]["1"] + model: ClassifierMixin = hp_output.load() + # fetch metadata we attached earlier + metric = float(hp_output.metadata["metric"].value) + if best_model is None or best_metric < metric: + best_model = model ### YOUR CODE ENDS HERE ### return best_model diff --git a/examples/e2e/steps/hp_tuning/hp_tuning_single_search.py b/examples/e2e/steps/hp_tuning/hp_tuning_single_search.py index 220e34fa76f..067ace93d03 100644 --- a/examples/e2e/steps/hp_tuning/hp_tuning_single_search.py +++ b/examples/e2e/steps/hp_tuning/hp_tuning_single_search.py @@ -16,16 +16,16 @@ # -from typing import Any, Dict, Tuple +from typing import Any, Dict import pandas as pd from sklearn.base import ClassifierMixin from sklearn.metrics import accuracy_score from sklearn.model_selection import RandomizedSearchCV from typing_extensions import Annotated -from utils.get_model_from_config import get_model_from_config +from utils import get_model_from_config -from zenml import step +from zenml import log_artifact_metadata, step from zenml.logger import get_logger logger = get_logger(__name__) @@ -39,9 +39,7 @@ def hp_tuning_single_search( dataset_trn: pd.DataFrame, dataset_tst: pd.DataFrame, target: str, -) -> Tuple[ - Annotated[ClassifierMixin, "best_model"], Annotated[float, "metric"] -]: +) -> Annotated[ClassifierMixin, "hp_result"]: """Evaluate a trained model. This is an example of a model hyperparameter tuning step that takes @@ -64,7 +62,7 @@ def hp_tuning_single_search( target: Name of target columns in dataset. Returns: - The best possible model parameters for given config. + The best possible model for given config. """ model_class = get_model_from_config(model_package, model_class) @@ -96,5 +94,10 @@ def hp_tuning_single_search( cv.fit(X=X_trn, y=y_trn) y_pred = cv.predict(X_tst) score = accuracy_score(y_tst, y_pred) + # log score along with output artifact as metadata + log_artifact_metadata( + output_name="hp_result", + metric=float(score), + ) ### YOUR CODE ENDS HERE ### - return cv.best_estimator_, score + return cv.best_estimator_ diff --git a/examples/e2e/steps/inference/inference_predict.py b/examples/e2e/steps/inference/inference_predict.py index a1b7e19cdde..14fe739a4e5 100644 --- a/examples/e2e/steps/inference/inference_predict.py +++ b/examples/e2e/steps/inference/inference_predict.py @@ -16,19 +16,23 @@ # +from typing import Optional + import pandas as pd from typing_extensions import Annotated -from zenml import step -from zenml.integrations.mlflow.model_deployers.mlflow_model_deployer import ( +from zenml import get_step_context, step +from zenml.integrations.mlflow.services.mlflow_deployment import ( MLFlowDeploymentService, ) +from zenml.logger import get_logger from zenml.model import ArtifactConfig +logger = get_logger(__name__) + @step def inference_predict( - deployment_service: MLFlowDeploymentService, dataset_inf: pd.DataFrame, ) -> Annotated[pd.Series, "predictions", ArtifactConfig(overwrite=False)]: """Predictions step. @@ -38,22 +42,37 @@ def inference_predict( This step is parameterized, which allows you to configure the step independently of the step code, before running it in a pipeline. - In this example, the step can be configured to use different input data - and model version in registry. See the documentation for more information: + In this example, the step can be configured to use different input data. + See the documentation for more information: https://docs.zenml.io/user-guide/advanced-guide/configure-steps-pipelines Args: - deployment_service: Deployed model service. dataset_inf: The inference dataset. Returns: - The processed dataframe: dataset_inf. + The predictions as pandas series """ ### ADD YOUR OWN CODE HERE - THIS IS JUST AN EXAMPLE ### - predictions = deployment_service.predict(request=dataset_inf) + model_version = get_step_context().model_config._get_model_version() + + # get predictor + predictor_service: Optional[ + MLFlowDeploymentService + ] = model_version.get_deployment("mlflow_deployment").load() + if predictor_service is not None: + # run prediction from service + predictions = predictor_service.predict(request=dataset_inf) + else: + logger.warning( + "Predicting from loaded model instead of deployment service " + "as the orchestrator is not local." + ) + # run prediction from memory + predictor = model_version.get_model_object("model").load() + predictions = predictor.predict(dataset_inf) + predictions = pd.Series(predictions, name="predicted") - deployment_service.deprovision(force=True) ### YOUR CODE ENDS HERE ### return predictions diff --git a/examples/e2e/steps/promotion/__init__.py b/examples/e2e/steps/promotion/__init__.py index 3ede8f58ebd..4397871460a 100644 --- a/examples/e2e/steps/promotion/__init__.py +++ b/examples/e2e/steps/promotion/__init__.py @@ -14,11 +14,7 @@ # See the License for the specific language governing permissions and # limitations under the License. # -from .promote_get_metric import promote_get_metric -from .promote_metric_compare_promoter_in_model_registry import ( - promote_metric_compare_promoter_in_model_registry, -) -from .promote_get_versions import promote_get_versions -from .promote_model_version_in_model_control_plane import ( - promote_model_version_in_model_control_plane, +from .compute_performance_metrics_on_current_data import ( + compute_performance_metrics_on_current_data, ) +from .promote_with_metric_compare import promote_with_metric_compare diff --git a/examples/e2e/steps/promotion/compute_performance_metrics_on_current_data.py b/examples/e2e/steps/promotion/compute_performance_metrics_on_current_data.py new file mode 100644 index 00000000000..edbcbaa6a7f --- /dev/null +++ b/examples/e2e/steps/promotion/compute_performance_metrics_on_current_data.py @@ -0,0 +1,83 @@ +# Apache Software License 2.0 +# +# Copyright (c) ZenML GmbH 2023. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from typing import Tuple + +import pandas as pd +from sklearn.metrics import accuracy_score +from typing_extensions import Annotated +from utils import get_model_versions + +from zenml import step +from zenml.logger import get_logger + +logger = get_logger(__name__) + + +@step +def compute_performance_metrics_on_current_data( + dataset_tst: pd.DataFrame, + target_env: str, +) -> Tuple[ + Annotated[float, "latest_metric"], Annotated[float, "current_metric"] +]: + """Get metrics for comparison during promotion on fresh dataset. + + This is an example of a metrics calculation step. It computes metric + on recent test dataset. + + This step is parameterized, which allows you to configure the step + independently of the step code, before running it in a pipeline. + In this example, the step can be configured to use different input data + and target environment stage for promotion. + See the documentation for more information: + + https://docs.zenml.io/user-guide/advanced-guide/configure-steps-pipelines + + Args: + dataset_tst: The test dataset. + + Returns: + Latest version and current version metric values on a test set. + """ + + ### ADD YOUR OWN CODE HERE - THIS IS JUST AN EXAMPLE ### + X = dataset_tst.drop(columns=["target"]) + y = dataset_tst["target"].to_numpy() + logger.info("Evaluating model metrics...") + + # Get model version numbers from Model Control Plane + latest_version, current_version = get_model_versions(target_env) + + # Get predictors + predictors = { + latest_version.number: latest_version.get_model_object("model").load(), + current_version.number: current_version.get_model_object( + "model" + ).load(), + } + + if latest_version != current_version: + metrics = {} + for version in [latest_version.number, current_version.number]: + # predict and evaluate + predictions = predictors[version].predict(X) + metrics[version] = accuracy_score(y, predictions) + else: + metrics = {latest_version.number: 1.0, current_version.number: 0.0} + ### YOUR CODE ENDS HERE ### + return metrics[latest_version.number], metrics[current_version.number] diff --git a/examples/e2e/steps/promotion/promote_with_metric_compare.py b/examples/e2e/steps/promotion/promote_with_metric_compare.py new file mode 100644 index 00000000000..2610a4b3953 --- /dev/null +++ b/examples/e2e/steps/promotion/promote_with_metric_compare.py @@ -0,0 +1,103 @@ +# Apache Software License 2.0 +# +# Copyright (c) ZenML GmbH 2023. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from utils import ( + get_model_registry_version, + get_model_versions, + promote_in_model_registry, +) + +from zenml import get_step_context, step +from zenml.logger import get_logger + +logger = get_logger(__name__) + + +@step +def promote_with_metric_compare( + latest_metric: float, + current_metric: float, + mlflow_model_name: str, + target_env: str, +) -> None: + """Try to promote trained model. + + This is an example of a model promotion step. It gets precomputed + metrics for 2 model version: latest and currently promoted to target environment + (Production, Staging, etc) and compare than in order to define + if newly trained model is performing better or not. If new model + version is better by metric - it will get relevant + tag, otherwise previously promoted model version will remain. + + If the latest version is the only one - it will get promoted automatically. + + This step is parameterized, which allows you to configure the step + independently of the step code, before running it in a pipeline. + In this example, the step can be configured to use precomputed model metrics + and target environment stage for promotion. + See the documentation for more information: + + https://docs.zenml.io/user-guide/advanced-guide/configure-steps-pipelines + + Args: + latest_metric: Recently trained model metric results. + current_metric: Previously promoted model metric results. + """ + + ### ADD YOUR OWN CODE HERE - THIS IS JUST AN EXAMPLE ### + should_promote = True + + # Get model version numbers from Model Control Plane + latest_version, current_version = get_model_versions(target_env) + + if latest_version.number == current_version.number: + logger.info("No current model version found - promoting latest") + else: + logger.info( + f"Latest model metric={latest_metric:.6f}\n" + f"Current model metric={current_metric:.6f}" + ) + if latest_metric >= current_metric: + logger.info( + "Latest model version outperformed current version - promoting latest" + ) + else: + logger.info( + "Current model version outperformed latest version - keeping current" + ) + should_promote = False + + promoted_version = get_model_registry_version(current_version) + if should_promote: + # Promote in Model Control Plane + model_version = get_step_context().model_config._get_model_version() + model_version.set_stage(stage=target_env, force=True) + logger.info(f"Current model version was promoted to '{target_env}'.") + + # Promote in Model Registry + promote_in_model_registry( + latest_version=get_model_registry_version(latest_version), + current_version=get_model_registry_version(current_version), + model_name=mlflow_model_name, + target_env=target_env.capitalize(), + ) + promoted_version = get_model_registry_version(latest_version) + + logger.info( + f"Current model version in `{target_env}` is `{promoted_version}` registered in Model Registry" + ) + ### YOUR CODE ENDS HERE ### diff --git a/examples/e2e/steps/training/model_evaluator.py b/examples/e2e/steps/training/model_evaluator.py index afc5091642c..f3deabdc29d 100644 --- a/examples/e2e/steps/training/model_evaluator.py +++ b/examples/e2e/steps/training/model_evaluator.py @@ -38,7 +38,7 @@ def model_evaluator( min_train_accuracy: float = 0.0, min_test_accuracy: float = 0.0, fail_on_accuracy_quality_gates: bool = False, -): +) -> None: """Evaluate a trained model. This is an example of a model evaluation step that takes in a model artifact diff --git a/examples/e2e/steps/training/model_trainer.py b/examples/e2e/steps/training/model_trainer.py index 82ae833bc54..75aa8fbcf9d 100644 --- a/examples/e2e/steps/training/model_trainer.py +++ b/examples/e2e/steps/training/model_trainer.py @@ -21,11 +21,14 @@ from sklearn.base import ClassifierMixin from typing_extensions import Annotated -from zenml import step +from zenml import log_artifact_metadata, step from zenml.client import Client from zenml.integrations.mlflow.experiment_trackers import ( MLFlowExperimentTracker, ) +from zenml.integrations.mlflow.steps.mlflow_registry import ( + mlflow_register_model_step, +) from zenml.logger import get_logger from zenml.model import ModelArtifactConfig @@ -47,6 +50,7 @@ def model_trainer( dataset_trn: pd.DataFrame, model: ClassifierMixin, target: str, + name: str, ) -> Annotated[ClassifierMixin, "model", ModelArtifactConfig()]: """Configure and train a model on the training dataset. @@ -73,6 +77,7 @@ def model_trainer( dataset_trn: The preprocessed train dataset. model: The model instance to train. target: Name of target columns in dataset. + name: The name of the model. Returns: The trained model artifact. @@ -87,6 +92,19 @@ def model_trainer( dataset_trn.drop(columns=[target]), dataset_trn[target], ) + + # register mlflow model + mlflow_register_model_step.entrypoint( + model, + name=name, + ) + # keep track of mlflow version for future use + log_artifact_metadata( + output_name="model", + model_registry_version=Client() + .active_stack.model_registry.list_model_versions(name=name)[-1] + .version, + ) ### YOUR CODE ENDS HERE ### return model diff --git a/examples/e2e/utils/__init__.py b/examples/e2e/utils/__init__.py new file mode 100644 index 00000000000..7c9578cb734 --- /dev/null +++ b/examples/e2e/utils/__init__.py @@ -0,0 +1,21 @@ +# Apache Software License 2.0 +# +# Copyright (c) ZenML GmbH 2023. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + + +from .get_model_from_config import get_model_from_config +from .model_versions import get_model_versions, get_model_registry_version +from .promote_in_model_registry import promote_in_model_registry diff --git a/examples/e2e/utils/model_versions.py b/examples/e2e/utils/model_versions.py new file mode 100644 index 00000000000..65e720e28d0 --- /dev/null +++ b/examples/e2e/utils/model_versions.py @@ -0,0 +1,62 @@ +# Apache Software License 2.0 +# +# Copyright (c) ZenML GmbH 2023. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from typing import Tuple + +from typing_extensions import Annotated + +from zenml import get_step_context +from zenml.model import ModelConfig +from zenml.models.model_models import ModelVersionResponseModel + + +def get_model_versions( + target_env: str, +) -> Tuple[ + Annotated[ModelVersionResponseModel, "latest_version"], + Annotated[ModelVersionResponseModel, "current_version"], +]: + """Get latest and currently promoted model versions from Model Control Plane. + + Args: + target_env: Target stage to search for currently promoted version + + Returns: + Latest and currently promoted model versions from the Model Control Plane + """ + latest_version = get_step_context().model_config._get_model_version() + try: + current_version = ModelConfig( + name=latest_version.model.name, version=target_env + )._get_model_version() + except KeyError: + current_version = latest_version + + return latest_version, current_version + + +def get_model_registry_version(model_version: ModelVersionResponseModel): + """Get model version in model registry from metadata of a model in the Model Control Plane. + + Args: + model_version: the Model Control Plane version response + """ + return ( + model_version.get_model_object("model") + .metadata["model_registry_version"] + .value + ) diff --git a/examples/e2e/utils/promote_in_model_registry.py b/examples/e2e/utils/promote_in_model_registry.py new file mode 100644 index 00000000000..d2aeaddf5da --- /dev/null +++ b/examples/e2e/utils/promote_in_model_registry.py @@ -0,0 +1,50 @@ +# Apache Software License 2.0 +# +# Copyright (c) ZenML GmbH 2023. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + + +from zenml.client import Client +from zenml.logger import get_logger +from zenml.model_registries.base_model_registry import ModelVersionStage + +logger = get_logger(__name__) + + +def promote_in_model_registry( + latest_version: str, current_version: str, model_name: str, target_env: str +): + """Promote model version in model registry to a given stage. + + Args: + latest_version: version to be promoted + current_version: currently promoted version + model_name: name of the model in registry + target_env: stage for promotion + """ + model_registry = Client().active_stack.model_registry + if latest_version != current_version: + model_registry.update_model_version( + name=model_name, + version=current_version, + stage=ModelVersionStage(ModelVersionStage.ARCHIVED), + metadata={}, + ) + model_registry.update_model_version( + name=model_name, + version=latest_version, + stage=ModelVersionStage(target_env), + metadata={}, + ) diff --git a/pyproject.toml b/pyproject.toml index ab08ccf169e..21b6fe7045d 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [tool.poetry] name = "zenml" -version = "0.46.0" +version = "0.46.1" packages = [{ include = "zenml", from = "src" }] description = "ZenML: Write production-ready ML code." authors = ["ZenML GmbH