-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model WatchTower] Delete running versions on failed pipelines with delete_new_version_on_failure
option
#1825
Conversation
Co-authored-by: Felix Altenberger <felix@zenml.io>
…SS-2427-full-version-management-in-context
…SS-2427-full-version-management-in-context
…ttps://github.com/zenml-io/zenml into feature/OSS-2427-full-version-management-in-context
recovery
optiondelete_new_version_on_failure
option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See unresolved comments from last review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for adjusting, looks good to me now 👍
Test failure caused by not stable behavior of MySQL due to the way or storing datetime. Fixed that, but will not wait for full CI once again. |
23a9c74
into
feature/OSS-2300-model-watch-tower-v0.1
* Implementation of Model table (#1802) * [Model Watch Tower] Adding ModelVersion and ModelVersionLink entities (#1811) * big bang commit * typo * Apply suggestions from code review Co-authored-by: Felix Altenberger <felix@zenml.io> * add Alembic * lint * mypy * darglint * wip * add endpoints * add ModelStages * wip * work with client * handle tags * fix integrations * move list around * update db schema * wip * lint * sync with model branch * wip * refactor * add stage transition * add update interface * add model version links * lint * fix crud tests * fix alembic branching * patch azure * lint * use zenml StrEnum * fix param name * fix tests in docker * fix tests for mysql * rename artifact ids variables * reorder methods * add direct getters * lint * split links into 2 tables * lint * pr comments --------- Co-authored-by: Felix Altenberger <felix@zenml.io> * [Model Watch Tower] Implement `zenml.log_artifact_metadata()` (#1813) * Delete dead code * Implement zenml.log_artifact_metadata() * Improve error handling and add tests * Remove 'description' arg of log_artifact_metadata() and fix docstring * Fix test name * Remove TODOs * Rewrite create_run_metadata() to make batch API requests * Adjust unit tests * Fix flaky test_is_secret_reference * Fix integration tests * [Model Watch Tower] Add ModelConfig (#1817) * big bang commit * typo * Apply suggestions from code review Co-authored-by: Felix Altenberger <felix@zenml.io> * add Alembic * lint * mypy * darglint * wip * wip * wip * wip * add endpoints * add ModelStages * wip * work with client * handle tags * fix integrations * move list around * update db schema * wip * lint * sync with model branch * wip * refactor * add stage transition * add update interface * add model version links * lint * fix crud tests * fix alembic branching * patch azure * lint * use zenml StrEnum * fix param name * ModelConfig implementation * start testing * fix tests in docker * more tests * fix tests for mysql * rename artifact ids variables * reorder methods * add direct getters * lint * split links into 2 tables * lint * pr comments * lint * remove `ModelStages._members()` * improve wording * Union[str, ModelStages] * simplify `get_model_version` to one endpoint * remove `_get_request_params` * split `_get_or_create_model_version` * ensure `get_or_create_model` is stable * add docstrings * lint * Apply suggestions from code review Co-authored-by: Felix Altenberger <felix@zenml.io> * Apply suggestions from code review Co-authored-by: Felix Altenberger <felix@zenml.io> --------- Co-authored-by: Felix Altenberger <felix@zenml.io> * lint * [Model WatchTower] add ModelConfig to step and pipeline decorators (#1819) * big bang commit * typo * Apply suggestions from code review Co-authored-by: Felix Altenberger <felix@zenml.io> * add Alembic * lint * mypy * darglint * wip * wip * wip * wip * add endpoints * add ModelStages * wip * work with client * handle tags * fix integrations * move list around * update db schema * wip * lint * sync with model branch * wip * refactor * add stage transition * add update interface * add model version links * lint * fix crud tests * fix alembic branching * patch azure * lint * use zenml StrEnum * fix param name * ModelConfig implementation * start testing * fix tests in docker * more tests * fix tests for mysql * rename artifact ids variables * reorder methods * add direct getters * lint * split links into 2 tables * lint * pr comments * add ModelConfig to step deco * Revert "add ModelConfig to step deco" This reverts commit ba9fd6a. * add ModelConfig to step deco * lint * wip * Add model_config to @pipeline and @step * add docstring * cleaning up after merge * move `ModelBaseModel` back to `models` * direct imports * breaking another circular dep * and one more circular dep --------- Co-authored-by: Felix Altenberger <felix@zenml.io> * [Model WatchTower] Add ArtifactConfig (+OSS-2427) & model version management in context (partially) (#1822) * big bang commit * typo * Apply suggestions from code review Co-authored-by: Felix Altenberger <felix@zenml.io> * add Alembic * lint * mypy * darglint * wip * wip * wip * wip * add endpoints * add ModelStages * wip * work with client * handle tags * fix integrations * move list around * update db schema * wip * lint * sync with model branch * wip * refactor * add stage transition * add update interface * add model version links * lint * fix crud tests * fix alembic branching * patch azure * lint * use zenml StrEnum * fix param name * ModelConfig implementation * start testing * fix tests in docker * more tests * fix tests for mysql * rename artifact ids variables * reorder methods * add direct getters * lint * split links into 2 tables * lint * pr comments * add ModelConfig to step deco * Revert "add ModelConfig to step deco" This reverts commit ba9fd6a. * add ModelConfig to step deco * lint * wip * Add model_config to @pipeline and @step * add artifact config and necessary registrations around it * fix bug with `Output` annotation * add docstring * Update src/zenml/new/pipelines/pipeline.py Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * cleaning up after merge * move `ModelBaseModel` back to `models` * direct imports * breaking another circular dep * clean-up after conflicts resolution * and one more circular dep * and one more circular dep * update tests to include `OutputSignature` * add `link_output_to_model` * stabilize tests * add implicit linkage * more tests * lint * docstrings * test/bug fixes * extend artifact link key to pipe/step/name * lint * fix for 3.8 * fix failing test call signatures * update_forward_refs centrally * Update src/zenml/model/artifact_config.py Co-authored-by: Felix Altenberger <felix@zenml.io> * Update src/zenml/zen_stores/sql_zen_store.py Co-authored-by: Felix Altenberger <felix@zenml.io> * add a warning about not supported feature * add arguments descriptions * improve docstrings a bit * assign_version_to_running internal + docs * rename to link_version * Update src/zenml/new/pipelines/pipeline.py Co-authored-by: Felix Altenberger <felix@zenml.io> * rename arg in test * lint * refactor a bit * address PR comment * remove deleted classes * Update src/zenml/models/model_base_model.py Co-authored-by: Felix Altenberger <felix@zenml.io> * move `ModelStages` to enums * merge `model_stage` and add tests for it * simplify `_link_artifacts_to_model` args * docstrings * docstrings * lint * fix args in tests * fix __mro__ test failure * fix for mlflow test * stabilize test_list_runs_is_ordered * fix pagination in some tests * address PR comments --------- Co-authored-by: Felix Altenberger <felix@zenml.io> Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * [Model WatchTower] Delete running versions on failed pipelines with `delete_new_version_on_failure ` option (#1825) * big bang commit * typo * Apply suggestions from code review Co-authored-by: Felix Altenberger <felix@zenml.io> * add Alembic * lint * mypy * darglint * wip * wip * wip * wip * add endpoints * add ModelStages * wip * work with client * handle tags * fix integrations * move list around * update db schema * wip * lint * sync with model branch * wip * refactor * add stage transition * add update interface * add model version links * lint * fix crud tests * fix alembic branching * patch azure * lint * use zenml StrEnum * fix param name * ModelConfig implementation * start testing * fix tests in docker * more tests * fix tests for mysql * rename artifact ids variables * reorder methods * add direct getters * lint * split links into 2 tables * lint * pr comments * add ModelConfig to step deco * Revert "add ModelConfig to step deco" This reverts commit ba9fd6a. * add ModelConfig to step deco * lint * wip * Add model_config to @pipeline and @step * add artifact config and necessary registrations around it * fix bug with `Output` annotation * add docstring * delete running versions on fail * add deletion test * don't return running without recovery * remove base_model.py * PR comments * remove merged migration * PR comments * improve model_config warnings * clean up merge mess * bandit is too strict * typos * clean up merge mess * remove not relevant asserts * improve docs * improve docs * rely on deployment in `get_new_version_requests` * stabilize tests in random order * improve docs * stabilize tests --------- Co-authored-by: Felix Altenberger <felix@zenml.io> * Fix OOM issue by disabling SkyPilot test * [Model WatchTower] Extend `ExternalArtifact` (#1839) * big bang commit * typo * Apply suggestions from code review Co-authored-by: Felix Altenberger <felix@zenml.io> * add Alembic * lint * mypy * darglint * wip * wip * wip * wip * add endpoints * add ModelStages * wip * work with client * handle tags * fix integrations * move list around * update db schema * wip * lint * sync with model branch * wip * refactor * add stage transition * add update interface * add model version links * lint * fix crud tests * fix alembic branching * patch azure * lint * use zenml StrEnum * fix param name * ModelConfig implementation * start testing * fix tests in docker * more tests * fix tests for mysql * rename artifact ids variables * reorder methods * add direct getters * lint * split links into 2 tables * lint * pr comments * add ModelConfig to step deco * Revert "add ModelConfig to step deco" This reverts commit ba9fd6a. * add ModelConfig to step deco * lint * wip * Add model_config to @pipeline and @step * add artifact config and necessary registrations around it * fix bug with `Output` annotation * add docstring * delete running versions on fail * add deletion test * Update src/zenml/new/pipelines/pipeline.py Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * cleaning up after merge * move `ModelBaseModel` back to `models` * direct imports * breaking another circular dep * clean-up after conflicts resolution * and one more circular dep * and one more circular dep * update tests to include `OutputSignature` * add `link_output_to_model` * stabilize tests * add implicit linkage * more tests * lint * docstrings * test/bug fixes * extend artifact link key to pipe/step/name * lint * fix for 3.8 * fix failing test call signatures * update_forward_refs centrally * Update src/zenml/model/artifact_config.py Co-authored-by: Felix Altenberger <felix@zenml.io> * Update src/zenml/zen_stores/sql_zen_store.py Co-authored-by: Felix Altenberger <felix@zenml.io> * add a warning about not supported feature * add arguments descriptions * improve docstrings a bit * assign_version_to_running internal + docs * rename to link_version * Update src/zenml/new/pipelines/pipeline.py Co-authored-by: Felix Altenberger <felix@zenml.io> * rename arg in test * lint * refactor a bit * address PR comment * remove deleted classes * Update src/zenml/models/model_base_model.py Co-authored-by: Felix Altenberger <felix@zenml.io> * move `ModelStages` to enums * merge `model_stage` and add tests for it * simplify `_link_artifacts_to_model` args * docstrings * docstrings * lint * fix args in tests * fix __mro__ test failure * fix for mlflow test * stabilize test_list_runs_is_ordered * update .gitignore * fix pagination in some tests * don't return running without recovery * remove base_model.py * PR comments * remove merged migration * PR comments * improve model_config warnings * clean up merge mess * external artifacts for model watchtower * bandit is too strict * typos * clean up merge mess * remove not relevant asserts * improve docs * improve docs * rely on deployment in `get_new_version_requests` * stabilize tests in random order * improve docs * use utils model_killer * clean up merge mess * use Annotated from typing_extensions * one more circular dep case * lint * resolve alembic branching * Apply suggestions from code review Co-authored-by: Felix Altenberger <felix@zenml.io> * get rid of `_testable_upload_if_necessary` * refactor `upload_if_necessary` * `zenml.artifacts.external_artifact.ExternalArtifact` * add tests for getters of MV response * improve docstring * fix before #1835 * lint * remove `_import_client` * fixing issues highlighted by tests * update docstrings * fix year in license * add docstrings * Update src/zenml/models/model_models.py Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * Update src/zenml/models/model_models.py Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * Update src/zenml/models/model_models.py Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * Update src/zenml/models/model_models.py Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * Update src/zenml/steps/external_artifact.py Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * Update src/zenml/models/model_models.py Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * fix docstrings * Move UnmaterializedArtifact to zenml.artifacts * add fix of path in docs * split ExternalArtifact into user class and config for step * update template working branch * restrict pydantic * Revert "restrict pydantic" This reverts commit b94e01f. --------- Co-authored-by: Felix Altenberger <felix@zenml.io> Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * [Model WatchTower] Extend client (#1849) * extend client for Model WatchTower * resolve alembic branching * aligh template ref * Auto-update of E2E template * [Model WatchTower] link runs to model versions (#1847) * linkage of pipeline runs * add docstring * tricky bug * full linkage on consumption * lint * use client * Auto-update of E2E template * update outdated test * create new version if any step touching it is executed and it was requested elsewhere * refactor `_link_pipeline_run_to_model` * add few more tests --------- Co-authored-by: GitHub Actions <actions@github.com> * remove deepchecks tests from CI * remove pytorch * remove pytorch_lightning * try tensorflow instead of pytorch * formatting * remove TF tests * formatting * try free ubuntu action * refactor * add skypilot back in * install missing pkg and update tests * add missing conditional to yaml * with permissions * add template release * [Model Control Plane] parallel running versions support (#1859) * linkage of pipeline runs * add docstring * tricky bug * full linkage on consumption * lint * use client * Auto-update of E2E template * update outdated test * create new version if any step touching it is executed and it was requested elsewhere * refactor `_link_pipeline_run_to_model` * add few more tests * parallel execution of model versions * add version number * improve readability * protect from misuse * extend `ArtifactConfig.model_version` * align model config docstrings * stabilize parallelized test * rework test as subprocess calls * skip subprocess test on windows * after merge mess * update tests flow based on develop * proper handle __latest__ mv in REST * fix get model version endpoint * simplify user-facing interface * fix test annotation --------- Co-authored-by: GitHub Actions <actions@github.com> * [Model Control Plane] Add CLI (#1861) * linkage of pipeline runs * add docstring * tricky bug * full linkage on consumption * lint * use client * Auto-update of E2E template * update outdated test * create new version if any step touching it is executed and it was requested elsewhere * refactor `_link_pipeline_run_to_model` * add few more tests * parallel execution of model versions * add version number * improve readability * protect from misuse * extend `ArtifactConfig.model_version` * align model config docstrings * stabilize parallelized test * rename `model.py` * rename * add cli for model watchtower * rework test as subprocess calls * Apply suggestions from code review Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * reshape tests * rebranding * skip subprocess test on windows * sort init * remove redundant param from `print_pydantic_models` * reduce redundant code * reduce too long line * add model update * after merge mess * lint * update tests flow based on develop * proper handle __latest__ mv in REST * fix get model version endpoint * calm down linter * restore proper name * typing import error * after merge mess --------- Co-authored-by: GitHub Actions <actions@github.com> Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * fallback to previous template * fallback to previous template --------- Co-authored-by: Felix Altenberger <felix@zenml.io> Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> Co-authored-by: GitHub Actions <actions@github.com> Co-authored-by: Alex Strick van Linschoten <stricksubscriptions@fastmail.fm>
* Discord alerter integration (#1818) * discord component integration * corrected library name and added necessary import * reformattes * fixed some issues * Format * added test cases and fixed a annotation issue in discord_alerter.py * modified logic to explicitly check for none embed object * doc changes * Apply suggestions from code review * Minor improvements to integration test * Docs: add more details on where to find the required parameters * Handle closed asyncio loops --------- Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> Co-authored-by: Felix Altenberger <felix@zenml.io> * Update Neptune dependency: `neptune-client` > `neptune` (#1837) * Update Neptune dependency: neptune-client > neptune * 'Fix' typing * Fix flaky label studio unit tests * disable codeql on develop pushes (#1842) * Template not updating due to git diff misuse (#1844) * fix CI for templates * debug run * revert changes * Auto-update of E2E template * trigger ci * revert ci --------- Co-authored-by: GitHub Actions <actions@github.com> * Bump feast version to fix api docs generation (#1845) * CI Fixes / Improvements (#1848) * Move e2e template test into update action and remove starter template test * Fix flaky ZenStore test * Pin evidently since newest version breaks Docker runs * Fix OOM issue by disabling SkyPilot test * fixing template failures * Revert "fixing template failures" This reverts commit 280e117. --------- Co-authored-by: Andrei Vishniakov <31008759+avishniakov@users.noreply.github.com> * Fix MLflow registry methods with empty metadata (#1843) * Fix MLflow registry methods with empty metadata * Auto-update of E2E template * add new template release * Auto-update of E2E template --------- Co-authored-by: Andrei Vishniakov <31008759+avishniakov@users.noreply.github.com> Co-authored-by: GitHub Actions <actions@github.com> * Use configured template REF in CI (#1851) * use ref from config for template * suppress log output in getting env * hardcode template ref * add helper text * revert debug changes * aligh template ref * Fix template REF in CI (#1852) * debug * update branch naming * rename release * Auto-update of E2E template * remove debug comments --------- Co-authored-by: GitHub Actions <actions@github.com> * Fix AWS service connector installation requirements (#1850) Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * [Docs] Improvements to custom flavor and custom orchestrator pages (#1747) * Added link to end-to-end custom orchestrator guide * Finished doc for custom flavor * Added reference on top of each doc page * Update docs/book/stacks-and-components/component-guide/writing-custom-components.md Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * Update docs/book/stacks-and-components/component-guide/writing-custom-components.md Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * Update docs/book/stacks-and-components/component-guide/writing-custom-components.md Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * Applied code changes * Added title * Added some minor modificatiosn for cloud docs * Apply suggestions from code review Co-authored-by: Barış Can Durak <36421093+bcdurak@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Barış Can Durak <36421093+bcdurak@users.noreply.github.com> * Apply review suggestions and add line breaks * Merge and redesign the two custom flavor docs pages * Adjust links * Adjust to review suggestion * Apply suggestions from code review Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * Move custom component docs to stacks-and-components/custom-solutions * Apply suggestions from code review Co-authored-by: Barış Can Durak <36421093+bcdurak@users.noreply.github.com> --------- Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> Co-authored-by: Barış Can Durak <36421093+bcdurak@users.noreply.github.com> Co-authored-by: Felix Altenberger <felix@zenml.io> * Optimizing the performance through database changes (#1835) * removed unused imports * mixed typos * removed the pipeline and step run relationship * formatted and removed pipeline run relationship * removed run relationship * adding references * add versions to the deployment model * removed num_steps * cleaning * orchestrators adjusted * added run and step run back * changed the model conversion * major updates to deployment, pipeline run and step run schemas and their model conversion * added new version fields * more changes to the models * removing unneccessary queries and hydration * typo * removing user hydration from various schemas * adding back the filtering * formatting * fixing filter models * getting rid of warnings * alembic migration * formatting and linting * fixing tests * fixing tests * fixed the num steps * moving imports * changing the request model * fixing the migration issues * optimizing fetching the producer id * fixed the cascase delete in schema relationships * formatting * fixing the artifact schema * fixed the num step problem * removed old migration script * new migration script * Update src/zenml/config/compiler.py Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * formatting * fixing request model in integration tests * adding pipeline id back in * missing cascade delete * test switched to clean workspace * excluded runs from crud tests as deployments do not exist yet * using clean workspaces * fixed attribute * removed duplicated test * proper cleaning for runs and artifacts after context --------- Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * Add `README` for `examples` folder (#1860) * draft README file * update with templates info * updated following PR comments * reorder table * Free up disk space in CI (#1863) * remove deepchecks tests from CI * remove pytorch * remove pytorch_lightning * try tensorflow instead of pytorch * formatting * remove TF tests * formatting * try free ubuntu action * refactor * add skypilot back in * install missing pkg and update tests * add missing conditional to yaml * with permissions * remove tool cache as well * Make Terraform Optional Again (#1855) * Make terraform an optional dependency * Mypy: ignore IPython imports * Delete unused zenml.recipes * Delete all unused constants --------- Co-authored-by: Michael Schuster <michael.schuster.ffb@googlemail.com> Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * rename watchtower (#1868) * Corrected 'zenml pipelines list' command (#1872) * Fix CI by freeing up space on runner (#1866) * try more free space removal * fix interpolation * fix again * remove docker image deletion completely * codeql runs less frequently * remove TF installation * add terraform & disable integration tests for ubuntu 3.11 * revert indentation * Allow for `user` param to be specified (successfully) in `DockerSettings` (#1857) * fix user bug * update test * Update src/zenml/utils/pipeline_docker_image_builder.py * Update tests/unit/utils/test_pipeline_docker_image_builder.py Co-authored-by: Felix Altenberger <felix@zenml.io> --------- Co-authored-by: Felix Altenberger <felix@zenml.io> * Add `get_pipeline_context` (#1870) * add `get_pipeline_context` * add docs * format example better * more meaningful example * update example * fix example --------- Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * require the json content of the file instead of the path (#1874) * External authenticator support, authorized devices and web login (#1814) * Auth changes for external authenticator support * consolidated ZenML Server configuration in its own class * moved JWT token authentication related logic to its own class * add support for more JWT token claims (issuer, audience) * add support for JWKS (public key criptography) secret keys * add support in client (REST Store) to automatically re-authenticate when a temporary JWT token expires * add support to store JWT access tokens in HTTP-only cookies with configurable domain and cookie name for increased security (TBD UI needs to be updated to not store tokens) Breaking changes: * renamed env variable ZENML_AUTH_TYPE to ZENML_SERVER_AUTH_SCHEME * renamed env variable ZENML_JWT_SECRET_KEY to ZENML_SERVER_JWT_SECRET_KEY * Add external authenticator support * Add DB migration * Add logout endpoint and other minor fixes * Minor improvements and DB migration script * Add CORS configuration to ZenML server * Minor fixes * Fixed linter errors * Fix darglint and security errors * Code review feedback and other minor improvements * Update src/zenml/zen_server/routers/auth_endpoints.py Co-authored-by: Alexej Penner <thealexejpenner@gmail.com> * Fix linter * Add support for web login and authorized devices * Add client and CLI support for authorized devices. * Add external server ID and remove default user and stack if external authenticator is used * Add all properties to authorized devices * Add IP address location * Fix ipinfo version * Fix linter * Fixed device authorization flow * Updated docs to suggest web login instead of username/pass login * Fix linter * Add API token endpoint and use it to issue workload tokens * Fix linter * Fix docstring errors * Fix alembic migration fork * Use configured max_failed_device_auth_attempts instead of default value * Switch to logger * Fix typo * Allow configuration of pipeline api token expiration * Analytics changes * Use external user email as unique username * Add missing identify event * Remove unused import * Add missing constant --------- Co-authored-by: Alexej Penner <thealexejpenner@gmail.com> Co-authored-by: Michael Schuster <michael.schuster.ffb@googlemail.com> * Add missing prefix to constant * Connect to Service-connector at component registration (#1858) * add connect attribute to stack component register command to connect service connector * apply suggested reviews * fixed upgrade script (#1877) * [Model Control Plane] early release (#1816) * Implementation of Model table (#1802) * [Model Watch Tower] Adding ModelVersion and ModelVersionLink entities (#1811) * big bang commit * typo * Apply suggestions from code review Co-authored-by: Felix Altenberger <felix@zenml.io> * add Alembic * lint * mypy * darglint * wip * add endpoints * add ModelStages * wip * work with client * handle tags * fix integrations * move list around * update db schema * wip * lint * sync with model branch * wip * refactor * add stage transition * add update interface * add model version links * lint * fix crud tests * fix alembic branching * patch azure * lint * use zenml StrEnum * fix param name * fix tests in docker * fix tests for mysql * rename artifact ids variables * reorder methods * add direct getters * lint * split links into 2 tables * lint * pr comments --------- Co-authored-by: Felix Altenberger <felix@zenml.io> * [Model Watch Tower] Implement `zenml.log_artifact_metadata()` (#1813) * Delete dead code * Implement zenml.log_artifact_metadata() * Improve error handling and add tests * Remove 'description' arg of log_artifact_metadata() and fix docstring * Fix test name * Remove TODOs * Rewrite create_run_metadata() to make batch API requests * Adjust unit tests * Fix flaky test_is_secret_reference * Fix integration tests * [Model Watch Tower] Add ModelConfig (#1817) * big bang commit * typo * Apply suggestions from code review Co-authored-by: Felix Altenberger <felix@zenml.io> * add Alembic * lint * mypy * darglint * wip * wip * wip * wip * add endpoints * add ModelStages * wip * work with client * handle tags * fix integrations * move list around * update db schema * wip * lint * sync with model branch * wip * refactor * add stage transition * add update interface * add model version links * lint * fix crud tests * fix alembic branching * patch azure * lint * use zenml StrEnum * fix param name * ModelConfig implementation * start testing * fix tests in docker * more tests * fix tests for mysql * rename artifact ids variables * reorder methods * add direct getters * lint * split links into 2 tables * lint * pr comments * lint * remove `ModelStages._members()` * improve wording * Union[str, ModelStages] * simplify `get_model_version` to one endpoint * remove `_get_request_params` * split `_get_or_create_model_version` * ensure `get_or_create_model` is stable * add docstrings * lint * Apply suggestions from code review Co-authored-by: Felix Altenberger <felix@zenml.io> * Apply suggestions from code review Co-authored-by: Felix Altenberger <felix@zenml.io> --------- Co-authored-by: Felix Altenberger <felix@zenml.io> * lint * [Model WatchTower] add ModelConfig to step and pipeline decorators (#1819) * big bang commit * typo * Apply suggestions from code review Co-authored-by: Felix Altenberger <felix@zenml.io> * add Alembic * lint * mypy * darglint * wip * wip * wip * wip * add endpoints * add ModelStages * wip * work with client * handle tags * fix integrations * move list around * update db schema * wip * lint * sync with model branch * wip * refactor * add stage transition * add update interface * add model version links * lint * fix crud tests * fix alembic branching * patch azure * lint * use zenml StrEnum * fix param name * ModelConfig implementation * start testing * fix tests in docker * more tests * fix tests for mysql * rename artifact ids variables * reorder methods * add direct getters * lint * split links into 2 tables * lint * pr comments * add ModelConfig to step deco * Revert "add ModelConfig to step deco" This reverts commit ba9fd6a. * add ModelConfig to step deco * lint * wip * Add model_config to @pipeline and @step * add docstring * cleaning up after merge * move `ModelBaseModel` back to `models` * direct imports * breaking another circular dep * and one more circular dep --------- Co-authored-by: Felix Altenberger <felix@zenml.io> * [Model WatchTower] Add ArtifactConfig (+OSS-2427) & model version management in context (partially) (#1822) * big bang commit * typo * Apply suggestions from code review Co-authored-by: Felix Altenberger <felix@zenml.io> * add Alembic * lint * mypy * darglint * wip * wip * wip * wip * add endpoints * add ModelStages * wip * work with client * handle tags * fix integrations * move list around * update db schema * wip * lint * sync with model branch * wip * refactor * add stage transition * add update interface * add model version links * lint * fix crud tests * fix alembic branching * patch azure * lint * use zenml StrEnum * fix param name * ModelConfig implementation * start testing * fix tests in docker * more tests * fix tests for mysql * rename artifact ids variables * reorder methods * add direct getters * lint * split links into 2 tables * lint * pr comments * add ModelConfig to step deco * Revert "add ModelConfig to step deco" This reverts commit ba9fd6a. * add ModelConfig to step deco * lint * wip * Add model_config to @pipeline and @step * add artifact config and necessary registrations around it * fix bug with `Output` annotation * add docstring * Update src/zenml/new/pipelines/pipeline.py Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * cleaning up after merge * move `ModelBaseModel` back to `models` * direct imports * breaking another circular dep * clean-up after conflicts resolution * and one more circular dep * and one more circular dep * update tests to include `OutputSignature` * add `link_output_to_model` * stabilize tests * add implicit linkage * more tests * lint * docstrings * test/bug fixes * extend artifact link key to pipe/step/name * lint * fix for 3.8 * fix failing test call signatures * update_forward_refs centrally * Update src/zenml/model/artifact_config.py Co-authored-by: Felix Altenberger <felix@zenml.io> * Update src/zenml/zen_stores/sql_zen_store.py Co-authored-by: Felix Altenberger <felix@zenml.io> * add a warning about not supported feature * add arguments descriptions * improve docstrings a bit * assign_version_to_running internal + docs * rename to link_version * Update src/zenml/new/pipelines/pipeline.py Co-authored-by: Felix Altenberger <felix@zenml.io> * rename arg in test * lint * refactor a bit * address PR comment * remove deleted classes * Update src/zenml/models/model_base_model.py Co-authored-by: Felix Altenberger <felix@zenml.io> * move `ModelStages` to enums * merge `model_stage` and add tests for it * simplify `_link_artifacts_to_model` args * docstrings * docstrings * lint * fix args in tests * fix __mro__ test failure * fix for mlflow test * stabilize test_list_runs_is_ordered * fix pagination in some tests * address PR comments --------- Co-authored-by: Felix Altenberger <felix@zenml.io> Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * [Model WatchTower] Delete running versions on failed pipelines with `delete_new_version_on_failure ` option (#1825) * big bang commit * typo * Apply suggestions from code review Co-authored-by: Felix Altenberger <felix@zenml.io> * add Alembic * lint * mypy * darglint * wip * wip * wip * wip * add endpoints * add ModelStages * wip * work with client * handle tags * fix integrations * move list around * update db schema * wip * lint * sync with model branch * wip * refactor * add stage transition * add update interface * add model version links * lint * fix crud tests * fix alembic branching * patch azure * lint * use zenml StrEnum * fix param name * ModelConfig implementation * start testing * fix tests in docker * more tests * fix tests for mysql * rename artifact ids variables * reorder methods * add direct getters * lint * split links into 2 tables * lint * pr comments * add ModelConfig to step deco * Revert "add ModelConfig to step deco" This reverts commit ba9fd6a. * add ModelConfig to step deco * lint * wip * Add model_config to @pipeline and @step * add artifact config and necessary registrations around it * fix bug with `Output` annotation * add docstring * delete running versions on fail * add deletion test * don't return running without recovery * remove base_model.py * PR comments * remove merged migration * PR comments * improve model_config warnings * clean up merge mess * bandit is too strict * typos * clean up merge mess * remove not relevant asserts * improve docs * improve docs * rely on deployment in `get_new_version_requests` * stabilize tests in random order * improve docs * stabilize tests --------- Co-authored-by: Felix Altenberger <felix@zenml.io> * Fix OOM issue by disabling SkyPilot test * [Model WatchTower] Extend `ExternalArtifact` (#1839) * big bang commit * typo * Apply suggestions from code review Co-authored-by: Felix Altenberger <felix@zenml.io> * add Alembic * lint * mypy * darglint * wip * wip * wip * wip * add endpoints * add ModelStages * wip * work with client * handle tags * fix integrations * move list around * update db schema * wip * lint * sync with model branch * wip * refactor * add stage transition * add update interface * add model version links * lint * fix crud tests * fix alembic branching * patch azure * lint * use zenml StrEnum * fix param name * ModelConfig implementation * start testing * fix tests in docker * more tests * fix tests for mysql * rename artifact ids variables * reorder methods * add direct getters * lint * split links into 2 tables * lint * pr comments * add ModelConfig to step deco * Revert "add ModelConfig to step deco" This reverts commit ba9fd6a. * add ModelConfig to step deco * lint * wip * Add model_config to @pipeline and @step * add artifact config and necessary registrations around it * fix bug with `Output` annotation * add docstring * delete running versions on fail * add deletion test * Update src/zenml/new/pipelines/pipeline.py Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * cleaning up after merge * move `ModelBaseModel` back to `models` * direct imports * breaking another circular dep * clean-up after conflicts resolution * and one more circular dep * and one more circular dep * update tests to include `OutputSignature` * add `link_output_to_model` * stabilize tests * add implicit linkage * more tests * lint * docstrings * test/bug fixes * extend artifact link key to pipe/step/name * lint * fix for 3.8 * fix failing test call signatures * update_forward_refs centrally * Update src/zenml/model/artifact_config.py Co-authored-by: Felix Altenberger <felix@zenml.io> * Update src/zenml/zen_stores/sql_zen_store.py Co-authored-by: Felix Altenberger <felix@zenml.io> * add a warning about not supported feature * add arguments descriptions * improve docstrings a bit * assign_version_to_running internal + docs * rename to link_version * Update src/zenml/new/pipelines/pipeline.py Co-authored-by: Felix Altenberger <felix@zenml.io> * rename arg in test * lint * refactor a bit * address PR comment * remove deleted classes * Update src/zenml/models/model_base_model.py Co-authored-by: Felix Altenberger <felix@zenml.io> * move `ModelStages` to enums * merge `model_stage` and add tests for it * simplify `_link_artifacts_to_model` args * docstrings * docstrings * lint * fix args in tests * fix __mro__ test failure * fix for mlflow test * stabilize test_list_runs_is_ordered * update .gitignore * fix pagination in some tests * don't return running without recovery * remove base_model.py * PR comments * remove merged migration * PR comments * improve model_config warnings * clean up merge mess * external artifacts for model watchtower * bandit is too strict * typos * clean up merge mess * remove not relevant asserts * improve docs * improve docs * rely on deployment in `get_new_version_requests` * stabilize tests in random order * improve docs * use utils model_killer * clean up merge mess * use Annotated from typing_extensions * one more circular dep case * lint * resolve alembic branching * Apply suggestions from code review Co-authored-by: Felix Altenberger <felix@zenml.io> * get rid of `_testable_upload_if_necessary` * refactor `upload_if_necessary` * `zenml.artifacts.external_artifact.ExternalArtifact` * add tests for getters of MV response * improve docstring * fix before #1835 * lint * remove `_import_client` * fixing issues highlighted by tests * update docstrings * fix year in license * add docstrings * Update src/zenml/models/model_models.py Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * Update src/zenml/models/model_models.py Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * Update src/zenml/models/model_models.py Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * Update src/zenml/models/model_models.py Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * Update src/zenml/steps/external_artifact.py Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * Update src/zenml/models/model_models.py Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * fix docstrings * Move UnmaterializedArtifact to zenml.artifacts * add fix of path in docs * split ExternalArtifact into user class and config for step * update template working branch * restrict pydantic * Revert "restrict pydantic" This reverts commit b94e01f. --------- Co-authored-by: Felix Altenberger <felix@zenml.io> Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * [Model WatchTower] Extend client (#1849) * extend client for Model WatchTower * resolve alembic branching * aligh template ref * Auto-update of E2E template * [Model WatchTower] link runs to model versions (#1847) * linkage of pipeline runs * add docstring * tricky bug * full linkage on consumption * lint * use client * Auto-update of E2E template * update outdated test * create new version if any step touching it is executed and it was requested elsewhere * refactor `_link_pipeline_run_to_model` * add few more tests --------- Co-authored-by: GitHub Actions <actions@github.com> * remove deepchecks tests from CI * remove pytorch * remove pytorch_lightning * try tensorflow instead of pytorch * formatting * remove TF tests * formatting * try free ubuntu action * refactor * add skypilot back in * install missing pkg and update tests * add missing conditional to yaml * with permissions * add template release * [Model Control Plane] parallel running versions support (#1859) * linkage of pipeline runs * add docstring * tricky bug * full linkage on consumption * lint * use client * Auto-update of E2E template * update outdated test * create new version if any step touching it is executed and it was requested elsewhere * refactor `_link_pipeline_run_to_model` * add few more tests * parallel execution of model versions * add version number * improve readability * protect from misuse * extend `ArtifactConfig.model_version` * align model config docstrings * stabilize parallelized test * rework test as subprocess calls * skip subprocess test on windows * after merge mess * update tests flow based on develop * proper handle __latest__ mv in REST * fix get model version endpoint * simplify user-facing interface * fix test annotation --------- Co-authored-by: GitHub Actions <actions@github.com> * [Model Control Plane] Add CLI (#1861) * linkage of pipeline runs * add docstring * tricky bug * full linkage on consumption * lint * use client * Auto-update of E2E template * update outdated test * create new version if any step touching it is executed and it was requested elsewhere * refactor `_link_pipeline_run_to_model` * add few more tests * parallel execution of model versions * add version number * improve readability * protect from misuse * extend `ArtifactConfig.model_version` * align model config docstrings * stabilize parallelized test * rename `model.py` * rename * add cli for model watchtower * rework test as subprocess calls * Apply suggestions from code review Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * reshape tests * rebranding * skip subprocess test on windows * sort init * remove redundant param from `print_pydantic_models` * reduce redundant code * reduce too long line * add model update * after merge mess * lint * update tests flow based on develop * proper handle __latest__ mv in REST * fix get model version endpoint * calm down linter * restore proper name * typing import error * after merge mess --------- Co-authored-by: GitHub Actions <actions@github.com> Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * fallback to previous template * fallback to previous template --------- Co-authored-by: Felix Altenberger <felix@zenml.io> Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> Co-authored-by: GitHub Actions <actions@github.com> Co-authored-by: Alex Strick van Linschoten <stricksubscriptions@fastmail.fm> * Update to templates (#1878) * Extended template capabilities * Github action changed and new templat features added * Docs updated to include more about templates * Github workflow updated * Renaming files is bigger than i thought * Update src/zenml/cli/base.py Co-authored-by: Andrei Vishniakov <31008759+avishniakov@users.noreply.github.com> * Help * Final changes * More tags * More tags --------- Co-authored-by: Andrei Vishniakov <31008759+avishniakov@users.noreply.github.com> * Docs for orgs, rbac and sso (#1875) * Orgs, Rbac and SSO * Apply suggestions from code review * Update docs/book/deploying-zenml/zenml-cloud/user-management.md --------- Co-authored-by: Hamza Tahir <hamza@zenml.io> * Convert network_config dict to NetworkConfig object in SageMaker orchestrator (#1873) * Convert network_config dict to NetworkConfig object in orchestrator * Slightly better input checking * Remove f-strings (no placeholders) * Fix sort order of NetworkConfig import block * Remove redundant reraise * Reformatting with black * Add TypeError Raises * Remove line --------- Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> Co-authored-by: Hamza Tahir <hamza@zenml.io> * add missing docker build options for gcp image builder (#1856) * solve alembic branching --------- Co-authored-by: Priyadutt <68959880+bhatt-priyadutt@users.noreply.github.com> Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> Co-authored-by: Felix Altenberger <felix@zenml.io> Co-authored-by: GitHub Actions <actions@github.com> Co-authored-by: Stefan Nica <stefan@zenml.io> Co-authored-by: Hamza Tahir <hamza@zenml.io> Co-authored-by: Barış Can Durak <36421093+bcdurak@users.noreply.github.com> Co-authored-by: Michael Schuster <michael.schuster.ffb@googlemail.com> Co-authored-by: Vishal Kumar. S <118868521+VishalKumar-S@users.noreply.github.com> Co-authored-by: Jayesh Sharma <wjayesh@outlook.com> Co-authored-by: Alexej Penner <thealexejpenner@gmail.com> Co-authored-by: Safoine El Khabich <34200873+safoinme@users.noreply.github.com> Co-authored-by: Alex Strick van Linschoten <stricksubscriptions@fastmail.fm> Co-authored-by: Christian Versloot <c.versloot@infoplaza.nl>
Describe changes
I implemented a clean-up step
delete_running_versions_without_recovery
inpipleine.run_
- it runs on the failure of the pipeline and cleans up all hanging ModelVersions inrunning
created during the pipeline run, but not having adelete_new_version_on_failure
option.Here we have one conceptual renaming:
recovery
ofModelConfig
becomesdelete_new_version_on_failure
to avoid confusion on model version retrieval logic going forward.Also inside
Pipeline
new methodget_new_version_requests
is added to fetch all model configurations across the pipeline requesting new versions to be created. This aims for several goals:register_running_versions
on success to makerunning
versions normal numbered versions or todelete_running_versions_without_recovery
on failure to clean-up versions not requesting to keep them on failure.I also touched a bit
ModelVersionResponseModel
to better support versioned artifacts:artifact_object_ids
,deployment_ids
andmodel_object_ids
becomesDict[str, Dict[str, UUID]]
:{"artifact_name":{"1":[...],"2":[...]}}
, where1
and2
are artifact link versions for versioned artifactget_model_object
and others also accept version nowThe issue with this implementation to be fixed in #1839:
artifact_object_ids
and others ofModelVersionResponseModel
causes collisions with not uniquely named artifactsPre-requisites
Please ensure you have done the following:
develop
and the open PR is targetingdevelop
. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.Types of changes