Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Artifacts Tab #1943

Merged
merged 34 commits into from
Nov 23, 2023
Merged

Artifacts Tab #1943

merged 34 commits into from
Nov 23, 2023

Conversation

fa9r
Copy link
Contributor

@fa9r fa9r commented Oct 13, 2023

Describe changes

Backend changes required to add an "Artifacts" tab to the ZenML dashboard.

High-Level Changes

New Docs Pages

Comprehensive Code Example

from typing import Optional, Tuple
from typing_extensions import Annotated

import numpy as np
from sklearn.base import ClassifierMixin
from sklearn.datasets import load_digits
from sklearn.svm import SVC
import zenml
from zenml import ArtifactConfig, ExternalArtifact, pipeline, step


@step
def versioned_data_loader_step() -> (
    Annotated[
        Tuple[np.ndarray, np.ndarray],
        ArtifactConfig(
            name="my_dataset",
            tags=["digits", "computer vision", "classification"],
        ),
    ]
):
    """Loads the digits dataset as a tuple of flattened numpy arrays."""
    digits = load_digits()
    return (digits.images.reshape((len(digits.images), -1)), digits.target)


@step
def model_finetuner_step(
    model: ClassifierMixin, dataset: Tuple[np.ndarray, np.ndarray]
) -> Annotated[
    ClassifierMixin, ArtifactConfig(name="my_model", tags=["SVC", "trained"])
]:
    """Finetunes a given model on a given dataset."""
    model.fit(dataset[0], dataset[1])
    return model


@pipeline
def model_finetuning_pipeline(
    dataset_version: Optional[str] = None,
    model_version: Optional[str] = None,
):
    # Either load a previous version of "my_dataset" or create a new one
    if dataset_version:
        dataset = ExternalArtifact(name="my_dataset", version=dataset_version)
    else:
        dataset = versioned_data_loader_step()

    # Load the model to finetune
    # If no version is specified, the latest version of "my_model" is used
    model = ExternalArtifact(name="my_model", version=model_version)

    # Finetune the model
    # This automatically creates a new version of "my_model"
    model_finetuner_step(model=model, dataset=dataset)


def main():
    # Save an untrained model as first version of "my_model"
    untrained_model = SVC(gamma=0.001)
    zenml.save_artifact(
        untrained_model, name="my_model", version="1", tags=["SVC", "untrained"]
    )

    # Create a first version of "my_dataset" and train the model on it
    model_finetuning_pipeline()

    # Finetune the latest model on an older version of the dataset
    model_finetuning_pipeline(dataset_version="1")

    # Run inference with the latest model on an older version of the dataset
    latest_trained_model = zenml.load_artifact("my_model")
    old_dataset = zenml.load_artifact("my_dataset", version="1")
    latest_trained_model.predict(old_dataset[0])

if __name__ == "__main__":
    main()

This would create the following pipeline run DAGs:

Run 1:

Screenshot 2023-11-17 at 11 12 54

Run 2:

Screenshot 2023-11-17 at 11 12 23

Breaking Changes

  • ExternalArtifact no longer supports artifact_name, model_name, model_version, model_artifact_name, and model_artifact_version arguments
  • Moved zenml.model.artifact_config.py to zenml.artifacts.artifact_config.py
  • ArtifactConfig no longer supports artifact_name and overwrite args
  • Deleted ModelArtifactConfig and DeploymentArtifactConfig
  • Deleted zenml.model.link_output_to_model.py
    -log_artifact_metadata() now expects a completely different set of arguments
  • Removed package level imports in zenml.model, i.e., from zenml.model import ... no longer works
  • ModelVersionArtifact no longer supports name, link_version, pipeline_name, and step_name arguments
  • ModelVersionPipelineRun no longer supports name argument
  • Reworked API endpoints and client / zen store methods for listing model version artifact / pipeline run links
  • model_version.get_X_artifact() functions no longer support pipeline_name and step_name args

Pre-requisites

Please ensure you have done the following:

  • I have read the CONTRIBUTING.md document.
  • If my change requires a change to docs, I have updated the documentation accordingly.
  • If I have added an integration, I have updated the integrations table and the corresponding website section.
  • I have added tests to cover my changes.
  • I have based my new branch on develop and the open PR is targeting develop. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Other (add details above)

fa9r added 2 commits October 13, 2023 09:51
* [Artifacts Tab] Globally Unique Artifact Names

* Fix default names for unlisted pipelines

* Fix alembic divergence

* Fix manual metadata logging

* Add docs

* Fix default names for named pipelines

* Add integration tests
@github-actions github-actions bot added internal To filter out internal PRs and issues enhancement New feature or request labels Oct 13, 2023
fa9r added 6 commits October 13, 2023 17:10
* [Artifacts Tab] Artifact Versioning

* Fix alembic divergence

* Adjust to review suggestion

* Fix unit tests

* Fix docstrings

* Fix integration tests

* Rework zenml artifact delete

* Make version always string

* Add integration test

* Add unit test for _get_new_artifact_version

* Revert "Make version always string"

This reverts commit ff9a569.

* Add version number field to DB & fix auto-increment versioning

* Adjust integration test to include version change 10->11

* Fix unit tests

* Adjust CLI messages according to review comments

* Fix integration tests
* [Artifacts Tab] Rework External Artifact

* Add docs

* Rename artifact_name > name, artifact_version > version

* Add unit tests

* Rewrite external artifact integration tests

* Delete outdated test

* Adjust to review suggestions

* Undo e2e example changes

* Fix unit tests
* [Artifacts Tab] Artifact Tags & Renaming

* Adjust to review suggestion

* Merge migrations

* Refactor ArtifactConfig and add tests

* Add version to artifact update model

* Delete link_output_to_model

* Fix most tests

* Add docs on artifact versioning and configuration

* Add ArtifactConfig to public Python API

* Fix circular import

* Fix linking cached artifacts

* OSS-2515 Rework model artifacts

* Remove ArtifactConfig.overwrite_model_link

* Adjust to review suggestions

* Fix more integration tests

* Some more integration test fixing

* Fix artifact link retrieval by name
@fa9r
Copy link
Contributor Author

fa9r commented Nov 9, 2023

Previous PR with open discussions: #1937

fa9r added 12 commits November 17, 2023 09:44
* Move zenml.new.steps.log_artifact_metadata > zenml.artifacts.utils

* WIP: redesign log_artifact_metadata

* Fix unit tests

* Move log_artifact_metadata integration test to tests/integration/functional/artifacts/test_utils.py

* Basic save_artifact() and load_artifact() implementations

* Merge zenml.utils.artifact_utils into zenml.artifacts.utils

* Add integration tests and fix docstrings

* Add docs on new util functions

* Add ExternalArtifact to public API

* Add artifact saving docs to toc

* Link manually saved and loaded artifacts to step

* Support zenml.load_artifact(id)
@fa9r fa9r marked this pull request as ready for review November 21, 2023 15:36
@fa9r fa9r requested review from avishniakov and bcdurak November 21, 2023 15:36
Copy link
Contributor

@bcdurak bcdurak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have some files to review, but I am making a checkpoint here. Will continue in the morning.

@fa9r fa9r requested a review from bcdurak November 22, 2023 05:13
Comment on lines +225 to +226
MODEL_VERSION_ARTIFACTS = "/model_version_artifacts"
MODEL_VERSION_PIPELINE_RUNS = "/model_version_pipeline_runs"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Cahllagerfeld @AlexejPenner are you aware of this and ok with it?

Copy link
Contributor

@avishniakov avishniakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with most of it. My major concern: why not allowing people to use ExternalArtifact in pair with model version instead of pipeline run? I might be misunderstanding something though.

Please, also document somehow API changes and make @Cahllagerfeld aware of it before merging.

Lastly, I'm getting right that it is still possible to do Annotated[int,"my_name"] without full-blown config object?

@@ -116,6 +117,29 @@ def get_artifact(
return zen_store().get_artifact(artifact_id, hydrate=hydrate)


@router.put(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Cahllagerfeld @AlexejPenner please review changes below

@@ -155,15 +157,20 @@ def delete_model_version(
# Model Version Artifacts
##########################

model_version_artifacts_router = APIRouter(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Cahllagerfeld @AlexejPenner please review changes below

Copy link
Contributor

@avishniakov avishniakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, take all my change requests as requests for a follow-up PR, since it is not that critical to get into another loop of aligning this PR.

@fa9r
Copy link
Contributor Author

fa9r commented Nov 22, 2023

@avishniakov I adjusted ExternalArtifact now to no longer have pipeline_run and pipeline args either, which feels indeed much cleaner.

@Cahllagerfeld and I also just aligned on the API changes and how we will make the latest backend changes available to him in general.

As for your last question, yes, Annotated[int, "my_name"] is still possible and is functionally equivalent to Annotated[int, ArtifactConfig("my_name")]. I think it still makes sense to keep the former as a short-cut syntax since the name is by far the most common thing users will need to configure.

@avishniakov
Copy link
Contributor

@avishniakov I adjusted ExternalArtifact now to no longer have pipeline_run and pipeline args either, which feels indeed much cleaner.

@Cahllagerfeld and I also just aligned on the API changes and how we will make the latest backend changes available to him in general.

As for your last question, yes, Annotated[int, "my_name"] is still possible and is functionally equivalent to Annotated[int, ArtifactConfig("my_name")]. I think it still makes sense to keep the former as a short-cut syntax since the name is by far the most common thing users will need to configure.

We discussed API for artifact links with @Cahllagerfeld and the conclusion is. that we need the following fields to be in the response model (non-hydrated one, for listing even): name (from artifact), type (from artifact), user (from artifact), updated (from link table)

@fa9r
Copy link
Contributor Author

fa9r commented Nov 22, 2023

@avishniakov Ok, I hydrated link.artifact now, this should give @Cahllagerfeld the info we need 👍

@fa9r fa9r requested a review from avishniakov November 22, 2023 13:22
@fa9r fa9r merged commit 845adf8 into develop Nov 23, 2023
32 of 33 checks passed
@fa9r fa9r deleted the feature/OSS-2190-data-as-first-class-citizen branch November 23, 2023 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request internal To filter out internal PRs and issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants