Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V2 Output Artifact Classes and Vertex Pipelines #6818

Closed
ml6-liam opened this issue Oct 27, 2021 · 6 comments
Closed

V2 Output Artifact Classes and Vertex Pipelines #6818

ml6-liam opened this issue Oct 27, 2021 · 6 comments

Comments

@ml6-liam
Copy link

I am trying to create a vertex pipeline using the kfp SDK v2, I'm not sure if this is a vertex issue or a kfp issue, so forgive me if this is the wrong place for this query.

I have a reusable component in my pipeline from which I want to return a Dataset Artifact.

in the component.yaml I have the output specified:

outputs:
    - name: model_configuration
      description: output dataset describing model configuration
      type: Dataset

and as well in the command of the yaml:

--model_configuration, {outputPath: model_configuration}

Then in the function implementing the components logic, I declare a function parameter for the output like so:
output_model_configuration_output: Output[Dataset]

in the Artifact types class (declared here: https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/v2/components/types/artifact_types.py) I can see there is a method for setting the path of the Artifact with output_artifact.path('Path/to/fil'), but when I implement this method in my code (output_model_configuration_output.path(f"{output_path}model_configuration.parquet")), I am met with an error:

TypeError: 'NoneType' object is not callable

I tried writing the URI To the artifact object's uri variable directly like so:

output_model_configuration_output.uri = f"{output_path}model_configuration.parquet"

This didn't throw an error, but the URI Value of the artifact object displayed in the vertex pipeline was not updated in the UI when the pipeline completed.

In addition, I tried adding some metadata to the artifact in this manner:
output_model_configuration_output.metadata['num_rows'] = float(len(model_configuration))

But I don't see this metadata reflected in the Vertex Pipeline UI When the pipeline run finishes, similar to the updated URI.

Let me know if there is anymore information I can provide, or if their is a more appropriate channel for this query.

@zijianjoy
Copy link
Collaborator

Can you try using this approach to set artifact path?

output_artifact.path = 'Path/to/fil'

cc @chensun for the rest of questions.

@ml6-liam
Copy link
Author

ml6-liam commented Nov 1, 2021

Hi, I tried this way, it does not throw an error and the pipeline succeeds, however the URI is not updated when I view the Artifacts/Metadata of the run in Vertex AI Console post-completion. Maybe it is more of a Vertex Problem.

@ml6-liam
Copy link
Author

ml6-liam commented Nov 1, 2021

Hi,

I have found a way that works. In the end we used kfp sdk to generate a yaml file based on a @component decorated python function, we then adapted this format for our reusable components. Our component.yaml now looks like this:

name: predict
description: Prepare and create predictions request
implementation:
    container:
      args:
      - --executor_input
      - executorInput: null
      - --function_to_execute
      - predict
      command:
      - python3
      - -m
      - kfp.v2.components.executor_main
      - --component_module_path
      - predict.py
      image: gcr.io/PROJECT_ID/kfp/components/predict:latest
inputs: 
    - name: input_1
      type: String
    - name: intput_2
      type: String
outputs:
    - name: output_1
      type: Dataset
    - name: output_2
      type: Dataset

with this change to the yaml, we can now successfully update the artifacts metadata dictionary, and uri through artifact.path = '/path/to/file'. These updates are displayed in the Vertex UI.

I am still unsure why the component.yaml format specified in the Kubeflow documentation does not work - I think this may be a bug with Vertex Pipelines.

@mrjgamble
Copy link

Thanks for this @ml6-liam . I've had similar issues of not being able to save metadata with Artifacts while using containerized components. I tried what you posted above and am able to save metadata successfully now. I would agree this seems like a bug.

@chensun
Copy link
Member

chensun commented Nov 12, 2021

@ml6-liam, glad you were able to figure it out.

I am still unsure why the component.yaml format specified in the Kubeflow documentation does not work - I think this may be a bug with Vertex Pipelines.

This isn't a bug but a new feature in v2. To support this feature we need to control the container entrypoint with kfp.v2.components.executor_main as you've already discovered. More specifically, the code that creates the artifact instances and saves the metadata is here: https://github.com/kubeflow/pipelines/blob/927d2a9f2dfdb90ae156979b9e0d72afa14adcd6/sdk/python/kfp/v2/components/executor.py

So if you had the legacy yaml component without this piece of code being injected in the container entrypoint, the functionality would be missing as expected.

Admit that we're currently short on documentation, our team is prioritizing documentation improvement in Q1 2022.

@chensun
Copy link
Member

chensun commented Nov 12, 2021

Also, you might want to take a look at: #6417 (comment)
Which would help you build your reusable components with full v2 features support.

@chensun chensun closed this as completed Nov 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants